Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2020 Mar 4;2019:1121–1128.

Predicting Wait Times in Pediatric Ophthalmology Outpatient Clinic Using Machine Learning

Wei-Chun Lin 1, Isaac H Goldstein 2, Michelle R Hribar 1, David S Sanders 2,3, Michael F Chiang 1,2
PMCID: PMC7153152  PMID: 32308909

Abstract

Patient perceptions of wait time during outpatient office visits can affect patient satisfaction. Providing accurate information about wait times could improve patients’ satisfaction by reducing uncertainty. However, these are rarely known about efficient ways to predict wait time in the clinic. Supervised machine learning algorithms is a powerful tool for predictive modeling with large and complicated data sets. In this study, we tested machine learning models to predict wait times based on secondary EHR data in pediatric ophthalmology outpatient clinic. We compared several machine-learning algorithms, including random forest, elastic net, gradient boosting machine, support vector machine, and multiple linear regressions to find the most accurate model for prediction. The importance of the predictors was also identified via machine learning models. In the future, these models have the potential to combine with real-time EHR data to provide real time accurate estimates of patient wait time outpatient clinics.

Introduction

In recent years, hospital systems have increasingly emphasized quality of care, which includes patient satisfaction.1,2 Patients’ perception of wait time in a primary care or specialty care outpatient clinic contributes to patient satisfaction.3-5 In fact, the literature suggests that the time a patient spends waiting for their scheduled appointment is the largest source of patient dissatisfaction.5 Longer wait time negatively impacts the patient’s satisfaction and distorts the patient’s perception of the quality of care and the physician’s abilities.1 Although waiting may be unavoidable in outpatient clinics, it is important to note that providing accurate information about wait times could improve patients’ satisfaction by reducing uncertainty.4,7 However, there are limited studies of proposed models to predict outpatient clinic wait time by using statistical methods.

The complexity of clinic workflows can make predicting patients’ wait time challenging. For example, in an ophthalmology clinic, ophthalmologists typically utilize multiple exam rooms simultaneously, examine patients at different stages in the visit, and integrate ancillary staffs and trainees into the clinical workflow.8 Thus, it is difficult to provide accurate estimates of wait times in clinical settings. To bridge this gap, secondary use of electronic health records (EHRs) with machine learning algorithms can be a reasonable choice. Supervised machine learning algorithms are an effective tool for predictive modeling with large and sophisticated data sets. These algorithms resist abnormal outliers, rank the relative importance of variables, and self-modify without human supervision.

Our study was performed in an academic ophthalmology department because it is a high volume, fast-paced specialty where estimating patient wait time is paramount. The purpose of this work was to develop analytical models that provide an accurate prediction of patients’ wait time in ophthalmology outpatient clinics using advanced machine learning methods. In addition, we wanted to determine the features that are most important for these predictive models.

Methods

This study was approved by the Institutional Review Board at Oregon Health and Science University (OHSU). OHSU is a large academic medical center in Portland, Oregon. This study was conducted at Casey Eye Institute, OHSU’s ophthalmology department serving all major ophthalmology subspecialties. The department performs over 130,000 outpatient examinations annually and is a major referral center in the Pacific Northwest and nationally. In 2006, OHSU implemented an institution-wide EHR (EpicCare; Epic Systems, Verona, WI) to handle all ambulatory practice management, clinical documentation, order entry, medication prescribing, and billing.

Data Preparation

We used 6 years (January 1, 2012 to March 31, 2018) of office visit data from seven pediatric ophthalmology faculty providers at OHSU Casey Eye Institute. Time-stamp and related data from office visits were extracted from the enterprise-wide clinical warehouse and audit log timestamp data were used to calculate time-related variables.. The appointment length is determined by the difference between the office visit check-in and checkout times recorded in the EHR. The provider-patient interaction times were determined by audit log values using previously validated methods.9 The wait time was defined as total wait time during the clinic appointment (total appointment length – provider-patient interaction time).

Data Preprocessing

Data cleaning – we removed all encounters with missing data and excluded abnormal time variables. Clinic visits were excluded if (1) wait time was longer than 180 minutes, less than 0 minutes, or missing; (2) provider-patient interaction time was longer than 120 minutes or less than 1 minute; (3) appointment length was longer than 300 minutes or less than 10 minutes; (4) arrival interval was longer than 240 minutes or less than 240 minutes.

Data transformation and dimensionality reduction – several variables were re-categorized and all categorical variables were converted to dummy variables. Some potential predictors were removed on the basis of our experience and observation.

Outcome Variables and Analytical Models

The primary outcome variable of regression models was patient wait time, which is a continuous variable. Four machine learning algorithms – random forest10, elastic net11, gradient boosting machine (GBM)12, and support vector machine (SVM)13 and multiple linear regression were developed to predict patient wait time. We chose these models to represent a broad approach to machine learning. Random forest and gradient boosting machine are ensemble decision trees algorithms, elastic net is a regularized method of linear regression, and support vector machine constructs hyperplanes to classify the data.

In addition, patient wait time was transformed into a categorical variable value as “Long” or “Normal”. “Long” indicated the upper half (> 58 minutes) wait time and “normal” indicated bottom half (≤58 minutes) wait time. We used the median of the patient wait time as cut-point. Four classification models, including random forest, elastic net with logistic regression, GBM, and SVM, were developed to determine “long wait time” or “normal wait time”. The categorical variable can be used in scheduling templates.16

Random forest is an ensemble decision trees algorithm, which grows with a bootstrap sample from the training data. Randomly selected subsets of predictor variables are implemented into each decision tree. The prediction for new observation is made by averaging the output of the ensemble of trees. We used R package randomForest to perform the prediction model.10

Elastic net is a regularized regression method that combines penalties of the lasso and ridge methods. We used R package glmnet to build the prediction model.11

Gradient-boosting machines is an ensemble learning method for improving predictive performance. Unlike random forest, which builds an ensemble of deep independent trees, GBM makes an ensemble of weak successive trees with each tree learning and improving on the previous one. We used R package gbm, assuming the Gaussian distribution for minimizing squared-error loss.12

Support vector machines are supervised learning models used for both classification and regression analysis. The algorithm outputs optimal hyperplanes, which categorizes new examples. We used R package e1071 with a linear kernel to build the prediction model.13

Predictor Variables

Twenty-four predictor variables were selected from the original data set and were grouped as below:

(1) Predictors related to date and time: Year, month, day of the month, day of the week, clinic session (AM or PM), clinic hour, which was the hour of the patient’s check-in time relative to the scheduled half-day clinic session.

(2) Patients’ demographic and clinical features: Age, which was grouped as six categories, visit name, patient financial class, whether the patient was a new patient, whether the patient was scheduled or walked in, and International Classification of Diseases, tenth revision (ICD-10) diagnosis code. We grouped the ICD-10 codes into categories based on diagnosis frequency. The 20 most common ICD-10 diagnoses were selected, and all the remaining diagnoses were categorized as “Other” (21 total categories).

(3) Predictors related to clinical examination: Boolean values representing if the patient exam included a pupil dilation, a visual acuity test, a visual field test, a tonometry test, a refraction test, or a fundoscopic exam.

(4) Predictors related to prior visit length: The wait time of patient’s previous office visit and the provider-patient interaction time of patient’s previous office visit.

(5) Predictor for the arrival interval for the current appointment: The arrival interval is defined as the duration between patient check-in time and the scheduled visit time. An arrival interval can either be positive (late arrival) or negative (early arrival).

(6) Predictors related to number of patients: The total number of patients seen by all providers on the day of the appointment and the clinic volume, which indicates the number of patients in the half-day provider’s clinic in which the appointment was scheduled.

Validation and Evaluation of Performance

The data set was randomly assigned to a training set (75%) and testing set (25%) to avoid over-fitting. K-fold cross-validation was used to validate the models. A larger number of folds makes less bias towards overestimating the truly expected error but may have higher variance. In this study, we used 5 fold cross-validation, which shows the flat slope of the learning curve and the size of each validation partition in 5 fold cross-validation being large enough to provide a fair estimate of the model’s performance. Each prediction models was developed with the training data set. Root mean square error (RMSE) and R2 were used to determine the predictive accuracy of each regression model. We developed five regression models in this study, including random forest, elastic net, GBM, SVM, and multiple linear regressions. Moreover, receiver operating characteristic (ROC) curves and the area under ROC curve (AUC) were used to evaluate the accuracy of the classification models for predicting “long wait time” or “normal wait time”. There are four classification models in the study, including random forest, elastic net with logistic regression, GBM, and SVM. We also ranked the importance of features based on the increase in mean square error of prediction as a result of a certain variable being permuted (%IncMSE) in the random forest regression model. All data processing and analyses were conducted using R programming language version 3.5.2.14

Results

Descriptive Data

Table 1 shows the characteristics of the patient visits including patient’s age, visit name, financial class, whether the patient was a new patient, and information about visit length. There were 37,787 (98.8%) patients visits that met inclusion criteria and 445 office visits were excluded (1.2%). Most patients were school age (32%) and pre-school age (26%). Amblyopia, Esotropia, and Exotropia are the most common diagnoses. Most patients used Medicaid and commercial insurance. Besides, 63% of patients came to follow-up visits, 23% of patients were new patients, and approximately half of patients (53%) were scheduled for pupil dilation.

Table 1.

Descriptive characteristics

graphic file with name 3202995t1.jpg

Evaluation Performance of Models

The performance of the five models is presented with R2 and RMSE in Table 2. We found that the random forest model had the highest R2 (0.3822), which means the random forest model can explain about 38 percent of the variability of the patient wait time. Also, the random forest model showed the best RMSE (24.24). Thus, we selected the random forest as the best predictive model in this study. The ten most important features of random forest regression model and their %IncMSE are shown in Figure 1. We found that patients scheduled for pupil dilation, patients’ primary ophthalmologist, and arrival interval have the strongest impact on the predictive model.

Table 2.

Comparison of R2 and RMSE of five predictive models

Method R-squared RMSE
Random forest 0.3822 24.24
Elastic net 0.3691 24.74
Linear regression 0.3646 25.01
GBM 0.3712 24.54
SVM 0.3633 24.71

Figure 1:

Figure 1:

The 1o most important features in the random forest model as determined by %IncMSE.

Figure 2 shows the ROC curves of four machine-learning models in the testing dataset for predicting when patient wait times will be long. Area Under the Curve, (AUC) was measured in all models. The random forest model presented the best performance (AUC=81.55% [95% CI 80.69%-82.41%]) followed by the support vector machine (AUC=81.14% [95% CI 80.28%-81.99%]), gradient boosted machine (AUC=80.05% [95% CI 79.17%-80.92%]) and elastic net (AUC=79.88% [95% CI 79%-80.76%]).

Figure 2:

Figure 2:

ROC curves of machine learning models

Discussion

In this study, we evaluated the applicability of machine learning models to predict patient wait time in pediatric ophthalmology clinics. The key findings from our study were (1) Machine learning models (such as random forest) can accurately predict patient wait time in pediatric ophthalmology outpatient clinic; (2) Machine learning models can provide insight into the factors associated with patient wait time; (3) Patient wait time predictive model can be a useful tool in managing clinical practices.

In our study, the random forest model provided the most accurate prediction (R2 0.38 and RMSE 24.22), which is higher than the prediction accuracy of the linear model (R2 0.36 and RMSE 25.02). Machine learning algorithms are a great choice for predicting sophisticated and noisy phenomena like patient wait time in a pediatric ophthalmology outpatient clinic.15 In our previous study, we used simulation models to develop improved patient scheduling templates, which decreased patient wait time and clinic length.9,16 The templates in those studies were based on predictions of provider-patient interaction time and complexity by an experienced clinician. Further, we showed that machine learning models can improve the prediction of patient complexity.17 This study shows that machine learning models are also effective at predicting patient wait times.

Moreover, random forest and GBM models were able to identify patient factors that are associated with patient wait time. The feature importance of random forest regression model and GBM regression model are similar to each other. The top three important predictor variables identified in both models were pupil dilation, then primary ophthalmologist, and arrival interval. The importance of pupil dilation may be because it takes about 25 minutes for pupils to fully dilate, and patients must wait during this period. In this study, the average wait time for patients with pupil dilation was 68.39 minutes (SD=27.09) and for patients without pupil dilation, the average wait time was 44.54 minutes (SD=30.13). The importance of the physician is because the clinic volume of each physician is different and each physician has their own pace when seeing patients. The range of the average wait time of seven faculty providers was 47.36 minutes to 75.01 minutes. The importance of arrival interval is due to patients arriving at the clinic early, which results in wait times. In our study, patients arrived at the clinic about 10 minutes earlier than their scheduled time on average and the Pearson correlation coefficient between patient wait time and arrival interval is -0.22.

Moreover, in Figure 1, we noticed that the year is the 4th important predictor variable in random forest regression model. The importance of year may be because the clinic policies/procedures changed over time. More investigation is needed to determine why this is important since it probably isn’t relevant for future predictive analyses. Exploring the important predictors can be useful for improving the predictive model in the future; for example, building models for each provider, eliminating dilation time from wait time and adjusting wait time calculations to eliminate bias from early arrivals may improve the accuracy of the wait time predictions. For elastic net, multiple linear regression and SVM regression models, we calculated the regression coefficients, which describes the relationship between each feature and the outcome variable. However, the units vary between the different types of variables and cannot be used as the feature importance directly.

Providing accurate information about wait times could improve patients’ satisfaction by reducing uncertainty.4,7 In future, the predictive models could be incorporated with real-time data to provide accurate information about wait times for patients in clinical settings. Even with perfect scheduling, it is difficult to completely eliminate patient wait time, so communicating potential delays to patients is important for maintaining patient satisfaction. In the future, the predictions in this study might be used in real-time to help inform patients of expected wait times while in clinic.

There are several limitations in our study. First, the EHR timestamps do not always accurately capture the provider- patient interaction time, which in turn can produce incorrect wait times. That said, we have previously validated that these provider interaction time timestamps were accurate across a wide range of ophthalmology providers at a single institution.9,16,17 Second, the models might improve with a larger data set and our study was limited at a single institution. Finally, the patterns observed in our study might not be generalizable to other subspecialties within ophthalmology or other healthcare systems. Our intention is to extend and replicate these study methods to different patient and provider populations at different institutions in order to increase the generalizability of our findings.

Conclusion

Patient perceptions of wait times in pediatric ophthalmology outpatient clinics can affect patient satisfaction and quality of care. However, there are not many studies about the efficient ways to predict wait time in the clinic. In this study, we found that supervised machine learning models can provide accurate patient wait time prediction and we were able to identify the factors with the largest contribution to patient wait times. It is important to note that patient satisfaction increases when patients are told about their expected wait time. In the future, we may be able to incorporate real-time scheduling data from the EHR to improve estimates of patient wait-time and scheduling efficiency in a clinic setting.

Acknowledgements

Supported by grants T15LM007088, R00LM012238, and P30EY0105072 from the National Institutes of Health, (Bethesda, MD) and unrestricted departmental support from Research to Prevent Blindness (New York, NY). MFC is an unpaid member of the Scientific Advisory Board for Clarity Medical Systems (Pleasanton, CA), a Consultant for Novartis (Basel, Switzerland), and an equity owner in Inteleretina, LLC (Honolulu, HI).

Figures & Table

References

  • 1.Grol R. Improving the quality of medical care: building bridges among professional pride, payer profit, and patient satisfaction. Jama. 2001;286((20)):2578–2585. doi: 10.1001/jama.286.20.2578. [DOI] [PubMed] [Google Scholar]
  • 2.van Campen C, Sixma H, Friele RD, Kerssens JJ, Peters L. Quality of care and patient satisfaction: a review of measuring instruments. Medical Care Research and Review. 1995;52((1)):109–133. doi: 10.1177/107755879505200107. [DOI] [PubMed] [Google Scholar]
  • 3.Billing K, Newland H, Selva D. Improving patient satisfaction through information provision. Clinical experimental ophthalmology. 2007;35((5)):439–447. doi: 10.1111/j.1442-9071.2007.01514.x. [DOI] [PubMed] [Google Scholar]
  • 4.Anderson RT, Camacho FT, Balkrishnan R. Willing to wait?: the influence of patient wait time on satisfaction with primary care. BMC health services research. 2007;7((1)):31. doi: 10.1186/1472-6963-7-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Camacho F, Anderson R, Safrit A, Jones AS. Hoffmann. The relationship between patient’s perceived waiting time and office-based practice satisfaction. NC Med J. 2006;67((6)):409–413. [PubMed] [Google Scholar]
  • 6.McMullen M, Netland PA. Wait time as a driver of overall patient satisfaction in an ophthalmology clinic. Clinical Ophthalmology. 2013;7:1655. doi: 10.2147/OPTH.S49382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jaworsky C, Pianykh O, Oglevee C. Patient feedback on waiting time displays. American Journal of Medical Quality. 2017;32((1)):108–108. doi: 10.1177/1062860616658974. [DOI] [PubMed] [Google Scholar]
  • 8.Hribar MR, Read-Brown S, Reznick L, et al. Secondary use of EHR timestamp data: validation and application for workflow optimization. Paper presented at: AMIA Annual Symposium Proceedings. 2015 [PMC free article] [PubMed] [Google Scholar]
  • 9.Hribar MR, Read-Brown S, Goldstein IH, et al. Secondary use of electronic health record data for clinical workflow analysis. Journal of the American Medical Informatics Association. 2017;25((1)):40–46. doi: 10.1093/jamia/ocx098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Liaw A, Wiener M. Classification and regression by randomForest. R news. 2002;2((3)):18–22. [Google Scholar]
  • 11.Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B. 2005;67((2)):301–320. [Google Scholar]
  • 12.Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of statistics. 2001:1189–1232. [Google Scholar]
  • 13.Suykens JA, Vandewalle J. Least squares support vector machine classifiers. Neural processing letters. 1999;9((3)):293–300. [Google Scholar]
  • 14.Team RC. R: A language and environment for statistical computing. R Foundation for Statistical Computing. [Internet]. Vienna, Austria; 2016 ISBN 3-900051-07-0. R-project. orgØID: SCR_001905; 2014 [Google Scholar]
  • 15.Murphy KP. Machine learning: a probabilistic perspective. MIT press. 2012 [Google Scholar]
  • 16.Hribar MR, Biermann D, Read-Brown S, et al. Clinic workflow simulations using secondary EHR data. Paper presented at: AMIA Annual Symposium Proceedings. 2016 [PMC free article] [PubMed] [Google Scholar]
  • 17.Lin WC, Goldstein IH, Hribar MR, Huang AB, Chiang MF. Secondary Use of Electronic Health Record Data for Prediction of Outpatient Visit Length in Ophthalmology Clinics. Paper presented at: AMIA Annual Symposium Proceedings. 2018 [PMC free article] [PubMed] [Google Scholar]
  • 18.Goldstein IH, Hribar MR, Read-Brown S, Chiang MF. Association of the presence of trainees with outpatient appointment times in an ophthalmology clinic. JAMA ophthalmology. 2018;136((1)):20–26. doi: 10.1001/jamaophthalmol.2017.4816. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES