Predicting no-show appointments in a pediatric hospital in Chile using machine learning

J Dunstan; F Villena; JP Hoyos; V Riquelme; M Royer; H Ramírez; J Peypouquet

doi:10.1007/s10729-022-09626-z

. 2023 Jan 28;26(2):313–329. doi: 10.1007/s10729-022-09626-z

Predicting no-show appointments in a pediatric hospital in Chile using machine learning

J Dunstan ^1,², F Villena ¹, JP Hoyos ³, V Riquelme ¹, M Royer ⁴, H Ramírez ^1,⁵, J Peypouquet ^6,^✉

PMCID: PMC10257628 PMID: 36707485

Abstract

The Chilean public health system serves 74% of the country’s population, and 19% of medical appointments are missed on average because of no-shows. The national goal is 15%, which coincides with the average no-show rate reported in the private healthcare system. Our case study, Doctor Luis Calvo Mackenna Hospital, is a public high-complexity pediatric hospital and teaching center in Santiago, Chile. Historically, it has had high no-show rates, up to 29% in certain medical specialties. Using machine learning algorithms to predict no-shows of pediatric patients in terms of demographic, social, and historical variables. To propose and evaluate metrics to assess these models, accounting for the cost-effective impact of possible intervention strategies to reduce no-shows. We analyze the relationship between a no-show and demographic, social, and historical variables, between 2015 and 2018, through the following traditional machine learning algorithms: Random Forest, Logistic Regression, Support Vector Machines, AdaBoost and algorithms to alleviate the problem of class imbalance, such as RUS Boost, Balanced Random Forest, Balanced Bagging and Easy Ensemble. These class imbalances arise from the relatively low number of no-shows to the total number of appointments. Instead of the default thresholds used by each method, we computed alternative ones via the minimization of a weighted average of type I and II errors based on cost-effectiveness criteria. 20.4% of the 395,963 appointments considered presented no-shows, with ophthalmology showing the highest rate among specialties at 29.1%. Patients in the most deprived socioeconomic group according to their insurance type and commune of residence and those in their second infancy had the highest no-show rate. The history of non-attendance is strongly related to future no-shows. An 8-week experimental design measured a decrease in no-shows of 10.3 percentage points when using our reminder strategy compared to a control group. Among the variables analyzed, those related to patients’ historical behavior, the reservation delay from the creation of the appointment, and variables that can be associated with the most disadvantaged socioeconomic group, are the most relevant to predict a no-show. Moreover, the introduction of new cost-effective metrics significantly impacts the validity of our prediction models. Using a prototype to call patients with the highest risk of no-shows resulted in a noticeable decrease in the overall no-show rate.

Supplementary Information

The online version contains supplementary material available at 10.1007/s10729-022-09626-z.

Keywords: No-show patients, Appointments and schedules, Machine learning, Medical informatics, Public health

Highlights

We predict the probability of patients missing their medical appointments, based on demographic, social and historical variables.
For each day and specialty, we provide a short list with the appointments that are more likely to be missed. The length of the list is determined using cost-effectiveness criteria. The hospital management can then apply a reduced number of actions in order to prevent the no-show or mitigate its effect.
The use of a prototype in the hospital resulted in an average of 10.3 percentage points reduction in no-shows when measured in an 8-week experimental design.

Introduction

With a globally increasing population, efficient use of healthcare resources is a priority, especially in countries where those resources are scarce [21]. One avoidable source of inefficiency stems from patients missing their scheduled appointments, a phenomenon known as no-show [7], which produces noticeable wastes of human and material resources [17]. A systematic review of 105 studies found that Africa has the highest no-show (43%), followed by South America (28%), Asia (25%), North America (24%), Europe (19%), and Oceania (13%), with a global average of 23% [11]. In pediatric appointments, no-show rates range between 15% and 30% [11], and tend to increase with the patients’ age [33, 44].

To decrease the rate of avoidable no-shows, hospitals can focus their efforts in three main areas:

a) Identifying the causes. The most common one is forgetting the appointment, according to a survey in the United Kingdom [36]. Lacy et al. [26] identified three additional issues: emotional barriers (negative emotions about going to see the doctor were greater than the sensed benefit), perceived disrespect by the health care system, and lack of understanding of the scheduling system. In pediatric appointments, other reasons include caregiver’s issues, scheduling conflicts, forgetting, transportation, public health insurance, and financial constraints [11, 19, 23, 39, 44, 49].

b) Predicting patients’ behaviour. To this end, researchers have used diverse statistical methods, including logistic regression [5, 20, 22, 40], generalised additive models [43], multivariate [5], hybrid methods with Bayesian updating [1], Poisson regression [41], decision trees [12, 13], ensembles [14, 37], and stacking methods [46]. Their efficiency depends on the ability of predictors to compute the probability of no-show for a given patient and appointment. Among adults, the most likely to miss their appointments are younger patients, those with a history of no-show, and those from a lower socioeconomic background, but variables such as the time of the appointment are also relevant [11].

c) Improving non-attendance rates using preventive measures. A review of 26 articles from diverse backgrounds found that patients who received a text notification were 23% less likely to miss their appointment than those who did not [42]. Similar results were obtained for personal phone calls in adolescents [39]. Text messages have been observed to produce similar outcomes to telephone calls, at a lower cost, in both adults [10, 18] and pediatric patients [29].

In terms of implementing mitigation actions, overbooking can maintain an efficient use of resources, despite no-show [2, 25]. However, there is a trade-off between efficiency and service quality. For other strategies, see the work of Cameron et al. [6].

This work is concerned with prediction and prevention in a pediatric setting. This is particularly challenging as attendance involves patients and their caregivers, who can moreover change over time.

We use machine learning methods to estimate the probability of no-show in pediatric appointments, and identify which patients are likely to miss them. This prediction is meant to be used by the hospital to reduce no-show rates through personalised actions. Since public hospitals have scarce resources and a tight budget, we introduce new metrics to account for both the costs and the effectiveness of these actions, which marks a difference with the work presented by Srinivas and Salah [47], which considers standard machine learning metrics, and Berg et al. [2], which balances interventions and opportunity costs, among others.

The paper is organised as follows: Section 2 describes the data and our methodological approach. It contains data description, the machine learning methods, our cost-effectiveness metrics, and the deployment. Results are shown in Section 3, paying particular attention to the metrics we constructed to assess efficiency, and the impact of the use of this platform, measured in an experimental design. Section 4 contains our conclusions and gives directions for future research. Finally, some details concerning the threshold tuning, and the balance between type I and II errors are given in the Appendix.

Materials and methods

Data description

Dr. Luis Calvo Mackenna Hospital is a high-complexity pediatric hospital in Santiago. We analysed the schedule of medical appointments from 2015 to 2018, comprising 395,963 entries. It contains socioeconomic information about the patient (commune of residence, age, sex,1 health insurance), and the appointment (specialty, type of appointment, day of the week, month, hour of the day, reservation delay), as well as the status of the appointment (show/no-show).

Although the hospital receives patients from the whole country, 70.7% of the appointments correspond to patients from the Eastern communes of Santiago (see Fig. 1). Among these communes, the poorest, Peñalolén, exhibits the highest percentage of no-show. Table 1 shows the percentage of appointments, no-shows and poverty depending on the patients’ commune of residence. For measuring poverty, we used the Chilean national survey Casen, which uses the multidimensional poverty concept to account for the multiple deprivations faced by poor people at the same time in areas such as education, health, among others [34].

Fig. 1 — Map of communes that belong to the East Metropolitan Health Service

Table 1.

Location of the referred center, the proportion of patients from the total of appointments, no-show rate and proportion of the population in multidimensional poverty [34]

Referred from	Appts. %	No-show %	Poverty %
Peñalolén	31.1	23.8	26.3
Macul	12.4	23.5	13.5
Ñuñoa	8.9	21.9	5.8
Lo Barnechea	4.8	22.4	17.2
Las Condes	4.6	21.3	4.2
Providencia	4.1	20.2	3.4
La Reina	4.1	23.3	7.0
Vitacura	0.5	20.6	3.5
Easter Island	0.2	16.6	21.7
Other communes	11.1	16.7	−
Rest of the country	18.2	13.4	−

Life cycle grouping	Age Range	Percentage
Nursling	0-5 months	9.7%
First infancy	6 months-4 years	24.1%
Second infancy	5-11 years	39.2%
Teenagers	12-17 years	26.2%
Young adults	18-25 years	0.8%

Group	Description	Appointments %	No-Show %
Socioeconomic Status
A	Without income/migrants	44.1	22.5
B	Less than US$425.	22.1	18.9
C	Between US$425 and US$620	13.0	18.9
D	Greater than US$621	17.3	18.3
Other	Without health insurance	2.0	20.4
Private	With private insurance	1.5	20.4
Type of appointment
1st time appointment	First visit for a certain medical episode	23.1	24.1
Routine appointment	Medical controls that follow 1st appointments	63.7	18.6
1st time derived	Special slots derived from primary healthcare	8.7	26.8
Other	Mainly medical prescriptions	4.5	16.6

Medical specialties (no-show %)
Pulmonology (23.2)	Ophthalmology (30.3)
Cardiology (14.7)	Oncology (4.9)
General Surgery (16.9)	Otorhinolaryngology (22.7)
Plastic Surgery (14.2)	Psychiatry (24.0)
Dermatology (28.1)	Rheumatology (20.9)
Endocrinology (22.1)	Traumatology (19.9)
Gastroenterology (19.3)	Urology (19.3)
Gynecology (25.1)	Genetics (24.5)
Hematology (15.8)	Pediatrics (22.6)
Nephrology (18.4)	Infectology (23.7)
Neurology (28.3)	Parasitology (18.8)
Nutrition (27.6)
Dental specialties (no-show %)
Pediatric dentistry (24.9)	Orthodontics (18.4)

Feature name	Description	Type	Categories/range
Age	Age at the day of the appointment, as the position in the life cycle:	Categorical	Nursling (0-5 months), first infancy (6 months-4 years), second infancy (5-11 years), teenager (12-17 years), young adult (18-25 years)
Sex	Sex of the patient	Categorical	Male, female
Commune of residence	Location of residence of the patient at the commune level.	Categorical	Any of the 346 communes of Chile
Insurance	Insurance type	Categorical	Group A (person without housing or income, or migrant, Group B (monthly income < US $ 425), Group C (monthly income ∈ [US $ 425;US $621)), Group D (monthly income > US $ 621), Provisory Insurance (people without health insurance)
Day of the week	Day of the week of the appointment	Categorical	Monday - Friday
Month	Month of the appointment	Categorical	January - December
Hour of the day	Hour of the day of the appointment as a categorical feature	Categorical	8hrs - 17hrs (ranges of one hour)
Reservation delay	Time in weeks from the creation of the appointment generation and the appointment itself as a categorical feature.	Numerical	0,1,2,…
Historical no-show	Calculated as the no-show citations divided by total citations prior the current appointment.	Numerical	Number between 0 and 1
Historical no-show by specialty	Calculated as the no-show citations divided by total citations prior the current appointment, both with respect to the considered specialty.	Numerical	Number between 0 and 1
Type of appointment	Type of the appointment, regardless its medical specialty	Categorical	First-time appointment, routine appointment, and first-time appointment derived from primary healthcare (PHC)

imbalanced-learn
RUS Boost	Balanced Random Forest
Balanced Bagging	Easy Ensemble
scikit-learn
Logistic Regression	Random Forest
Ada Boost	Support Vector Machines

Model	Parameter	Values
AdaBoost	Decision tree max_depth	1, 2, 5, 8, 10, 15
	Decision tree min_samples_leaf	2, 3, 5, 10, 20, 40
	n_estimators	50, 100, 200, 300, 500, 750, 1000
	learning_rate	0.01, 0.05, 0.1, 0.2,None
Random Forest	bootstrap	True, False
Balanced Random	max_features	auto, sqrt
Forest (imblearn)	n_estimators	200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000
	max_depth	10, 20, 30, 40, 50, 60, 70, 80, 90, 100,None
	min_samples_split	2, 5, 10, 50
Support Vector Machine	Kernel	linear, rbf
	C	1,10,100,1000
	Gamma (rbf kernel only)	1,0.1,0.001,0.0001
Logistic Regression	penalty	L1, L2
	C	0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1000
RUS Boost	n_estimators	50, 100, 400, 800, 1000, 1200, 1400, 1600, 1800, 2000
	replacement	True, False
Balanced Bagging	bootstrap	True, False
	bootstrap_features	True, False
	replacement	True, False
	n_estimators	10, 50, 100, 200, 500, 1000, 1200, 1400, 1600, 1800
EasyEnsemble	replacement	True, False
	n_estimators	10, 50, 100, 200, 500, 1000, 1200, 1400, 1600, 1800

Specialty	Algorithm	Threshold	P_C	NSP_i	NSP_f	P_R	m₁	m₂	AUC
Cardiology	RandomForestClassifier	0.55	0.10	0.16	0.13	0.18	1.76	0.16	0.63
Dermatology	RandomForestClassifier	0.56	0.13	0.26	0.21	0.22	1.61	0.19	0.65
Endocrinology	RandomForestClassifier	0.54	0.20	0.21	0.14	0.33	1.68	0.27	0.66
Gastroenterology	BalancedBaggingClassifier	0.68	0.11	0.19	0.15	0.21	1.90	0.19	0.65
General surgery	LogisticRegression	0.67	0.19	0.13	0.08	0.40	2.17	0.33	0.72
Genetics	BalancedRandomForestClassifier	0.57	0.18	0.23	0.18	0.24	1.32	0.20	0.57
Gynecology	BalancedBaggingClassifier	0.65	0.14	0.24	0.19	0.22	1.54	0.19	0.61
Hematology	RandomForestClassifier	0.54	0.16	0.16	0.10	0.38	2.31	0.32	0.73
Infectology	RandomForestClassifier	0.57	0.11	0.26	0.21	0.21	1.79	0.18	0.64
Nephrology	BalancedBaggingClassifier	0.73	0.11	0.15	0.12	0.23	2.17	0.21	0.69
Neurology	BalancedBaggingClassifier	0.64	0.12	0.26	0.20	0.23	1.91	0.20	0.68
Nutrition	LogisticRegression	0.65	0.10	0.32	0.27	0.16	1.53	0.14	0.60
Oncology	RandomForestClassifier	0.50	0.09	0.04	0.03	0.29	3.26	0.26	0.72
Ophtalmology	BalancedRandomForestClassifier	0.65	0.13	0.31	0.24	0.21	1.61	0.18	0.62
Orthodontics	BalancedBaggingClassifier	0.63	0.17	0.21	0.11	0.47	2.87	0.40	0.80
Otorhinolaryngology	BalancedBaggingClassifier	0.61	0.18	0.22	0.14	0.37	2.07	0.30	0.69
Parasitology	BalancedBaggingClassifier	0.72	0.12	0.17	0.12	0.26	2.20	0.23	0.65
Pediatric dentistry	BalancedBaggingClassifier	0.67	0.11	0.30	0.24	0.20	1.86	0.18	0.66
Pediatrics	BalancedRandomForestClassifier	0.63	0.13	0.25	0.19	0.23	1.75	0.20	0.64
Plastic surgery	BalancedRandomForestClassifier	0.67	0.21	0.10	0.05	0.47	2.22	0.37	0.76
Psychiatry	RandomForestClassifier	0.56	0.14	0.25	0.19	0.25	1.78	0.21	0.65
Pulmonology	BalancedRandomForestClassifier	0.61	0.27	0.17	0.09	0.49	1.85	0.36	0.74
Rheumatology	BalancedRandomForestClassifier	0.66	0.11	0.22	0.18	0.16	1.54	0.14	0.60
Traumatology	BalancedBaggingClassifier	0.65	0.13	0.18	0.14	0.22	1.71	0.20	0.63
Urology	BalancedRandomForestClassifier	0.61	0.13	0.19	0.15	0.23	1.73	0.20	0.63

Specialty	Precision_show	Precision_no−show	Recall_show	Recall_no−show	F₁ Score_show	F₁ Score_no−show
Cardiology	0.85	0.29	0.91	0.18	0.88	0.22
Dermatology	0.76	0.42	0.90	0.22	0.82	0.28
Endocrinology	0.83	0.35	0.84	0.33	0.83	0.34
Gastroenterology	0.83	0.36	0.91	0.21	0.87	0.27
General surgery	0.91	0.28	0.85	0.40	0.88	0.33
Genetics	0.78	0.31	0.84	0.24	0.81	0.27
Gynecology	0.78	0.37	0.88	0.22	0.83	0.28
Hematology	0.88	0.37	0.88	0.38	0.88	0.37
Infectology	0.76	0.47	0.92	0.21	0.83	0.29
Nephrology	0.87	0.33	0.92	0.23	0.89	0.27
Neurology	0.77	0.50	0.92	0.23	0.84	0.31
Nutrition	0.70	0.49	0.92	0.16	0.80	0.24
Oncology	0.97	0.12	0.92	0.29	0.94	0.17
Ophtalmology	0.72	0.49	0.91	0.21	0.80	0.29
Orthodontics	0.87	0.59	0.92	0.47	0.89	0.53
Otorhinolaryngology	0.83	0.45	0.88	0.37	0.85	0.40
Parasitology	0.86	0.37	0.91	0.26	0.89	0.30
Pediatric dentistry	0.73	0.55	0.93	0.20	0.82	0.30
Pediatrics	0.78	0.43	0.90	0.23	0.84	0.30
Plastic surgery	0.93	0.22	0.81	0.47	0.87	0.30
Psychiatry	0.78	0.45	0.90	0.25	0.84	0.32
Pulmonology	0.88	0.32	0.78	0.49	0.83	0.39
Rheumatology	0.80	0.33	0.91	0.16	0.85	0.22
Traumatology	0.84	0.31	0.89	0.22	0.86	0.26
Urology	0.83	0.33	0.89	0.23	0.86	0.27

Feature	Correlation
Historical no-show	0.16
Reservation delay = 0 weeks	− 0.15
Historical no-show by specialty	0.15
Appointment type = routine appointment	− 0.07
Commune of residence = outside Santiago	− 0.07
Hour = 8	0.06
Commune of residence = Peñalolén	0.05
Appointment type = 1st appointment	0.05
Appointment type = 1st appointment PHC	0.05
Insurance = A Group	0.04
Reservation delay = 5 weeks	0.03
Commune of residence = Macul	0.03
Day of the week = Monday	0.03
Reservation delay = 6 weeks	0.03
Commune of residence = others in Santiago	− 0.03
Reservation delay = 3 weeks	0.03
Insurance = D Group	− 0.03
Day of the week = Wednesday	− 0.03
Hour of the day = 11	− 0.02

Feature	Correlation
Reservation delay = 0 weeks	− 0.20
Historical no-show	0.15
Appointment type = 1st appointment	0.08
Appointment type = 1st appointment PHC	0.08
Reservation delay = 30-50 weeks	0.08
Age = first infancy	0.07
Hour = 15	0.05
Insurance = A Group	0.05
Commune of residence = Peñalolén	0.05
Age = second infancy	− 0.05
Month = May	− 0.04
Hour = 12	− 0.04
Month = December	0.03
Day of the week = Monday	0.01

Specialty	No-show rate		Reduction in
	Control group	Intervention group	percentage points
Ophthalmology	29.6%	12.1%	17.5
Neurology	17.6%	5.0%	12.6
Traumatology	19.0%	10.3%	8.7
Dermatology	24.0%	21.1%	2.9
Total	21.0%	10.7%	10.3

Metrics	Bias B_μ(λ_PP,λ_NN,δ)
GM	0
MCC	$\frac{λ_{P P} + λ_{N N} - 1}{2 \sqrt{[λ_{P P} + (1 - λ_{N N}) \frac{1 + δ}{1 - δ}] [λ_{N N} + (1 - λ_{P P}) \frac{1 + δ}{1 - δ}]}} -$
	$\frac{λ_{P P} + λ_{N N} - 1}{2 \sqrt{[λ_{P P} + (1 - λ_{N N})] [λ_{N N} + (1 - λ_{P P})]}}$
m₁	$\frac{2 λ_{P P}}{λ_{P P} (1 + δ) + (1 - λ_{N N}) (1 - δ)} - \frac{2 λ_{P P}}{λ_{P P} + (1 - λ_{N N})}$
m₂	$\frac{λ_{P P}}{2} (λ_{N N} (1 - δ) - λ_{P P} (1 + δ)) - \frac{λ_{P P}}{2} (λ_{N N} - λ_{P P})$

Feature	Count
Historical no-show	19
Insurance = A Group	16
Commune of residence = Peñalolen	16
Reservation delay = 0 weeks	15
Age = second infancy	9
Appointment type = routine appointment	6
Reservation delay = 1 weeks	6
Commune of residence = Macul	5
Hour = 10	5
Day of the week = Thursday	3
Hour = 11	3
Insurance = B Group	3
Day of the week = Tuesday	3
Sex = male	3
Day of the week = Monday	3
Age = first infancy	3
Sex = female	3
Day of the week = Wednesday	3
Appointment type = first appointment	2
Hour = 9	2

PERMALINK

Predicting no-show appointments in a pediatric hospital in Chile using machine learning

J Dunstan

F Villena

JP Hoyos

V Riquelme

M Royer

H Ramírez

J Peypouquet

Abstract

Supplementary Information

Highlights

Introduction

Materials and methods

Data description

Fig. 1.

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Machine learning methods

Table 6.

Table 7.

Cost-effectiveness metrics

Deployment

Results

Table 8.

Table 15.

Fig. 2.

Table 9.

Fig. 3.

Table 10.

Table 11.

Fig. 4.

Table 12.

Experimental design

Table 13.

Conclusions, perspectives

Supplementary Information

Acknowledgements

Appendix A: Threshold tuning

Fig. 5.

A.1 Ratio between type I and II errors

Fig. 6.

A.2 Metric bias

Table 14.

Fig. 7.

A.3 ML metrics the best models for each specialty

Declarations

Ethics approval

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases