Skip to main content
BMJ Simulation & Technology Enhanced Learning logoLink to BMJ Simulation & Technology Enhanced Learning
. 2018 Jun 22;5(2):85–90. doi: 10.1136/bmjstel-2017-000289

Effective resource management using machine learning in medicine: an applied example

Alan Williams 1, Ann-Marie Mekhail 2,3, James Williams 4, Johanna McCord 5,#, Vanessa Buchan 6,#
PMCID: PMC8936600  PMID: 35519832

Abstract

Background

The field of medicine is rapidly becoming digitised, and in the process passively amassing large volumes of healthcare data. Machine learning and data analytics are advancing rapidly, but these have been slow to be taken up in the day-to-day delivery of healthcare. We present an application of machine learning to optimise a laboratory testing programme as an example of benefiting from these tools.

Methods

Canterbury District Health Board has recently implemented a system for urgent lab sample processing in the community, reducing unnecessary emergency presentations to hospital. Samples are transported from primary care facilities to a central laboratory. To improve the efficiency of this service, our team built a prototype transport scheduling platform using machine learning techniques and simulated the efficiency and cost impact of the platform using historical data.

Results

Our simulation demonstrated procedural efficiency and potential for annual savings between 5% and 14% from implementing a real-time lab sample transport scheduling platform. Advantages included providing a forward job list to the laboratory, an expected time to result and a streamlined transport request process.

Conclusion

There are a range of opportunities in healthcare to use large datasets for improved delivery of care. We have described an applied example of using machine learning techniques to improve the efficiency of community patient lab sample processing at scale. This is with a view to demonstrating practical avenues for collaboration between clinicians and machine learning engineers.

Keywords: inefficiency in health, resource management big data, primary care, healthcare resource utilization, clinical informatics

Introduction

As medicine expands in scope and population served, the traditional model becomes unsustainable as a method of providing safe and high-quality care within practical constraints.1 Medicine cannot afford to continue with patient–doctor interactions that are administratively labour intensive.1 Healthcare needs to move towards a scalable and efficient model.2 There is a consensus that opportunities to use data engineering in health are plentiful,2–4 but there is little in medical journals describing useful examples of this application. We describe an accessible and practical example of applying machine learning in healthcare.

Canterbury District Health Board developed Acute Demand Management Services to ease the pressure on emergency and acute inpatient providers by facilitating the provision of urgent care in the community. As part of this, Canterbury Health Laboratories (CHL) offered an Acute Demand Rapid Transport (ADRT) service. This service processes urgent laboratory investigations requested from primary care providers.

A private taxi provider is contracted to transport individual samples from primary care clinics to CHL. Journeys are requested for single samples by clinics interacting directly with the taxi provider. The annual cost of transport of lab samples is >300 000NZD and increasing year-on-year. Reviews of taxi invoices indicate that journeys are often requested from nearby locations at similar times. This suggests that combining journeys would be more cost-effective with potentially minimal impact to clinical care.

However, manually monitoring journey requests and assigning multiple samples to journeys is not an efficient way of doing this. We explore the extent that a trip scheduling system can generate combined journeys that meet time-to-result criteria with lower operational costs. This system uses a model for predicting the time and cost of a taxi journey given two location points in the city and a model for predicting the future occurrence of sample requests.

A machine learning system is a programme for which the output depends on previously observed information. Machine learning is consistently defined as creating a functional mapping between input information and an output value or reward, both given in a training dataset. This function is then used to predict output values or generate actions for new and unseen data.5 New data inputs into this model are used to update the model predictions as the machine ‘learns’.

A centralised trip scheduling system receives sample transport requests from originating clinics at discrete intervals in time. The simplest mode of operation is to output a journey for each sample at the time of receipt; this is the existing situation. To combine journeys, a list of outstanding transport requests must be maintained, rather than scheduling a journey immediately. This list is referred to as the trip buffer. Multiple transport requests are combined by assigning a journey to a subset of the outstanding transport requests at some later time. A consequence of this is that samples will arrive at the laboratory later than in the existing situation. This delay is subject to strict constraints to maintain quality of care. To understand whether the combined journey to be assigned meets the constraints on sample delivery time and is of lower cost than transporting all samples individually, the scheduling platform requires a method of estimating journey times, arrival times and costs with adequate certainty. We construct methods for estimating these values by using a machine learning system trained on historical data.

This project aims to provide three things:

  1. A machine learning system for predicting time and cost for taxi journeys transporting lab samples given historical data. The code for this is provided on GitHub (link below) and can be used elsewhere—provided training datasets are in similar format and the current lab transport operational model is similar—free of charge.

  2. Socialisation of the day-to-day opportunities in medicine, clinically and operationally, in cross-collaborating with the discipline of mathematical modelling and machine learning.

  3. A functional server-side scheduling system, intended to be integrated into a web interface for scheduling transport for lab samples, using the recommended outputs from the above modelling process. The code for this is also provided on GitHub—free of charge at https://github.com/mekan841/urgent-pathology-routing.

Methods

Historical data collection

Two exports from an internal database were obtained from the private taxi contractor containing records of taxi journeys charged to the ADRT programme over consecutive periods from November 2016–February 2017 and February 2017–June 2017. The first dataset contained 3510 records of journeys, including the following fields: date, time, origin, destination, fare, driving time, waiting time and distance driven. The second dataset contained 7903 records, including the following fields: date, time, origin, destination and fare. No reason for the difference in format was available from the taxi company. After removing records not pertaining to laboratory sample transport, the dataset consisted of 7988 taxi journeys from 145 origin locations around the city between February and June 2017. Tables 1 and 2 provide descriptive statistics on the datasets.

Table 1.

Dataset 1 descriptive statistics

Minimum Median Maximum Unique values
Date and time 21 November 2016
09:59
19 February 2017
23:36
Origin 131
Destination 1 (laboratory)
Driving time (min) 1 12 54
Waiting time (min) <1 6 46
Distance (km) <1 3.95 30.55
Total journeys 3510

Table 2.

Dataset 2 descriptive statistics

Minimum Median Maximum Unique values
Date and time 21 February 2017
00:19
25 June 2017
23:55
Origin 144
Destination 1 (laboratory)
Total journeys 7988

The origin and destination were recorded as fragments of text. Full addresses and geographic coordinates were obtained using a commercial API service provided by Google.6 On querying, this service returns a range of structured information, including latitude and longitude coordinates. Figure 1 shows a map of origin and destination locations of trips in the city.

Figure 1.

Figure 1

Map of Christchurch showing origin and destination locations.

Journey distance and time model

The distance and time required for a journey between clinic and laboratory locations was modelled by fitting a regression model to the data from dataset 1. The distance and time were modelled as a function of the discrete locations and the time of day, all modelled as categorical variables. Where no data were available, estimated times and distances were obtained using a commercial API service provided by Google,6 and these values were used in the same way as the historical data. These estimated values would be continuously updated with observed data during platform operation.

Journey cost model

The cost for a journey was estimated by fitting a regression model to data from datasets 1 and 2, modelling the cost as a function of the distance and driving time. Partial plots7 describing the relationship between the cost and independent variables of distance and time are shown in figure 2.

Figure 2.

Figure 2

Multiple regression model expressing fare as a function of journey time and distance. (A) Partial plot of the relationship between the journey time and fare given other factors. (B) Partial plot of the relationship between the journey distance and fare given other factors.

These models are not static—in an operational system, they would continually be recalculated using machine learning to account for variation in road network conditions. For instance, if an arterial route closed for maintenance this would result in a sudden change in the time required to travel between two locations; however, the model would incorporate the new data and recalculate the predicted journey times and costs in the scheduling platform.

Scheduling platform

A method for assigning sample transport requests in the trip buffer to combined journeys was then developed. The following assumptions were made for the purposes of this work:

  • A delay in time to result for clinics of 20–60 min would not significantly impact patient care.

    This was derived from qualitative analysis of primary care provider interviews who were asked to describe what would happen in event of particular result delays and what delays would spur them to send a patient to the emergency department rather than wait for a result in the community (questions available on request).

  • Taxis can be assigned a sample route with multiple destinations, but these routes cannot be updated after the initial request.

The cost of a taxi journey is the sum of a fixed fee and a value proportional to journey time and distance, determined using the regression model developed above.

The problem of optimally assigning combined journeys to transport requests given time constraints can be formulated as a discrete-time, finite horizon Markov decision process (MDP).8 9 Solving a planning problem modelled as a MDP is equivalent to looking for a policy (algorithm) that can achieve minimum long-term expected cost.8 10 11 Due to the complexity of the problem, involving portions similar to the non-polynomial complete travelling salesman problem,12 we develop two sets of heuristics for obtaining a solution described in box 1. The choice of method used here depends on the desired trade-off between complexity and marginal time savings.

Box 1. Algorithm determining the optimal set of taxi journeys to minimise cost in real time.

  1. At an arbitrary start time, initialise an empty buffer for transport requests.

  2. Observe whether any transport requests have arrived in the previous time step. Add these transport requests to the buffer.

  3. Determine whether to schedule a journey for a subset of locations in the buffer.

    Batching option:

    1. Calculate the latest time (the ‘must-send’ time) at which a direct clinic to laboratory journey must be scheduled for each request in the buffer (to meet the time-to-result constraints).

    2. Group the transport requests in the buffer by originating clinic. For each clinic group:

      1. Determine if the ‘must-send’ time for the oldest transport request in the buffer falls within the current time step. If so, initiate a journey transporting all samples in the buffer for this location to the laboratory.

      2. Remove these transport requests from the buffer.

    Interclinic routing option:

    1. Calculate the latest time (the ‘must-send’ time) at which a direct clinic to laboratory journey must be scheduled for each request in the buffer (to meet the time-to-result constraints).

    2. Group the transport requests in the buffer by originating clinic.

    3. Enumerate all possible combinations of single and multidestination routes between clinic locations and the laboratory that transports all samples to the laboratory.

    4. Determine the arrival times and costs for each combination of routes.

    5. Remove combinations of journeys that do not satisfy time-to-result constraints f. Select the minimum cost combination of journeys.

    6. Determine the journeys from the minimum-cost combination of journeys that require initiation this time step to meet time constraints.

      1. If so, initiate the subset of combinations of journeys.

      2. Remove transport requests from the buffer.

  4. Go to 2.

The first set of heuristics involves ’batching' transport requests from individual clinics. Transport requests are queued up at each clinic until the oldest sample requires immediate scheduling to meet the time-to-result constraints, at which point a journey is scheduled to transport all samples in the queue. For high-volume clinics, this method results in significant savings; however, requests from low-volume clinics are rarely effectively combined with other journeys.

The second set of heuristics follows the sample approach as the first but also combines journeys between clinics where this results in additional cost savings (interclinic routing). For instance, a journey from a low-volume clinic near a high-volume clinic can often be combined with a journey transporting multiple samples from the high-volume clinic, saving an individual journey.

Cost impact simulation

Sample transport requests from the second dataset were replayed chronologically and sent to the scheduling platform, which calculated the combined journeys that would have been made had the platform been in operation over this period. As the true data for cost of combined journeys are not available, the estimated cost for the combined journeys was computed and compared with the actual cost of non-combined journeys.

Generating simulated transport requests

As the cost-saving estimate above is for the one known 3-month period, the variation in percentage cost reduction for similar 3-month periods is unknown. To estimate this variability, a generative model for sample transport requests was developed, intended to approximate the true distribution as closely as possible. Generating sample data enables multiple simulations to be run, and CIs for the cost reduction generated.

Sample transport requests from each clinic in the dataset were modelled as inhomogeneous Poisson point processes. As the amount of data available for most clinics was too low to accurately estimate the daily variation in rate, this was assumed to be constant for day-only and day/night clinics and was obtained by combining the requests from each group of locations, respectively. This daily variation in rate was then scaled by the observed long-term average rate for each individual clinic to generate an occurrence rate as a function of time for each clinic. These inhomogeneous Poisson point processes were then sampled over the same period as the second dataset to generate a synthetic dataset.

Multiple cost impact simulations were then run using synthetic datasets and used to obtain an estimate of the 95% CI for cost savings achieved. This approach also allows the impact of varying the long-term average rate for each clinic to understand the effect of differing geospatial population distributions on the observed savings.

Results

Accuracy of the journey cost model

The accuracy of the estimated times and distances was evaluated using cross-validation and a held-out dataset, by fitting linear models relating estimated times and distances to known values. The relationship between the estimated and the known values of time and distance for the journeys in the first dataset7 is shown in figure 3. For a perfect model, these plots would show a perfectly linear relationship between the estimated and known values. The estimated distance is much more accurate than the estimated time due to the greater variability in traffic conditions than route distance.

Figure 3.

Figure 3

Comparison of the time and distance estimates with known values from first dataset to assess the quality of the model. (A) Linear model fit between the estimated times and known times. (B) Linear model fit between the estimated distances and the known distances.

Cost reduction achievable in Christchurch

Table 3 shows the outcome of the simulation, demonstrating that using data engineering to inform transport bookings for lab samples could produce savings of between 7 % and 18 %, depending on the time-to-result delay constraint selected from the acceptable range.

Table 3.

Scheduling method results: combining journeys from the top three clinics

Time-to-result
delay constraint (min)
% Cost reduction
(95% CI)
Interclinic routing 20 7.21 to 8.45
Per-clinic batching 20 6.12 to 7.31
Interclinic routing 40 15.11 to 16.50
Per-clinic batching 40 12.65 to 14.10
Interclinic routing 60 20.44 to 21.96
Per-clinic batching 60 17.59 to 19.08

The estimated CIs shown were generated from the sensitivity study described.

Discussion

Effect of the method for combining trips in the buffer

Our simulation indicated that there would be a statistically significant saving to clinics by adopting a lab transport scheduling platform that used our interclinic routing algorithm.

It is of note that the top three addresses (out of 145) disproportionately accounted for nearly two-thirds of all journeys generated. This transport request distribution accounts for the simpler batching heuristic achieving a cost reduction close to that of the more complex interclinic routing heuristic—most of the journey combination options arise from two clinics alone, in different locations, shown in figure 4.

Figure 4.

Figure 4

Transport requests from the top 10 primary care facilities by request count, February to June 2017.

The specific set of heuristics used to combine journeys in the trip buffer affects the degree of savings achieved. The method of batching trips in the buffer into originating location, and then combining trips from each location individually when the oldest trip in the buffer requires transport to meet constraints, achieves much of the cost reduction. Adding the option for interclinic routing to the heuristic generates an additional cost reduction of at least 1.1% to 2.8% at the 95% confidence level in absolute terms, depending on the delay constraint, for minimal additional implementation cost.

We expect that implementation of the scheduling the platform would bring other benefits such as providing the laboratory with a list of samples in transit, visibility for the clinic on expected result time and reduced workload in requesting taxis. These side benefits may be as significant as the direct cost saving achieved. We also expect that the marginal cost reduction due to interclinic routing would be greater in a larger city with linear journeys passing by more than one high-volume clinic.

Limitations

This work describes the outcomes from the model during the period under study and may not generalise to other periods in time because traffic conditions are an inherently variable system. However, in order to determine that the model was robust under different conditions, we developed a generative model for traffic given the observed data, simulated many datasets for an identical period of time, and obtained results from these simulated datasets which produced the range of cost savings described above. The generative model itself may not perfectly approximate the true generative process, but we note that the likelihood of expecting the observed data is high, given the generative model parameters.

Another limitation is that our piece of work does not validate the assumption that the operational shift to an online platform for booking will be acceptable to clinical users. Implementation of enterprise software is disruptive, and further work is required to assess the impact of this disruption.

Finally, all data-driven algorithms are dependent on the fidelity of data inputted. The data available were from a shorter time snapshot than we would have preferred, but it is expected that learning would allow for continual updates of the travel time predictions as more data are accumulated.

Alternative approaches

Large logistics companies frequently run optimisation software to address similar issues such as the one described in this paper. However, this software is generally bespoke and no existing, off-the-shelf, cheap options could be found for the public healthcare system. Relatively unique characteristics of this problem are very small delays possible and often a 1–1 mapping between transport assets (taxi) and a package. As part of our work we reached out to several mathematical software companies to enquire about available solutions; however, none could offer us turnkey product without quoting high consultancy fees for customisation. On-the-ground collaborations such as this are by far a more cost-effective way for the healthcare system to have access to these services.

Multidisciplinary working

As part of designing the simulation it was necessary to have a full understanding of health service delivery and the constraints. This is where collaboration with medical staff was vital to correctly outline the problem and ensure that the solution did not hinder patient care.

Our initial user interviews (questions available in online supplementary document or on request) with clinicians did not yield useful information about time constraints. Questions were simply phrased how much longer could you wait for a lab result. We were told, no time at all. It was only on rephrasing queries to ask specifically what types of lab tests were requested and could the clinician describe what the situation would look like if there were additional waits of 20/60/120 min on that result being returned that user interview outputs became more useful.

Supplementary data

bmjstel-2017-000289supp001.pdf (4KB, pdf)

We found the small multidisciplinary setting to be an ideal way of working; it forced the team to justify dogmatically accepted wisdom in their own fields—such as the definition of ‘clinical safety’ or the implications of a ‘stochastic model’. While service must be clinically designed, the best service design does not come from a team of clinicians only and academic disciplines in silos result in significant advances in each field without each being aware of the tools available for leverage from the other.13

Potential for machine learning in medicine

There are growing concerns that medical school is not equipping future doctors with the skill-set to apply these methods to their design of health services.2 We have shown one example of what collaboration between two fields of study could look like13; our goal is that this becomes a more common way of working in medicine. This is not the sole, nor the most significant application of data engineering to healthcare delivery.14 The applicability of machine learning to medicine and large-scale healthcare design is immense,4 14 particularly as the datasets collected by the increasingly comprehensive electronic patient records grow,15 16 dataset interoperability becomes a reality with the new Fast Health Interoperability Resources API,17 18 and modelling and deep learning techniques become more advanced.

Conclusions

Collaboration between medical and data engineering staff led to development of a model for more efficient urgent lab sample transport. The use of similar machine intelligence techniques can be extrapolated to several areas in medicine.

Acknowledgments

The authors thank Canterbury District Health Board and Via Innovations for conceptualising and initiating this project and engaging us to collaborate.

Footnotes

JMC and VB contributed equally.

Contributors: AW: Machine learning model design and build, data analysis and manuscript write up. A-MM: Manuscript write up and contexualisation. JM: Project initiation, advice, and review JW: Model validation and manuscript review VB: Project initiation, advice, and review

Funding: This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests: AW and JW are co-founders of the software company Isogonal.

Provenance and peer review: Not commissioned; externally peer reviewed.

References

  • 1. Bekemeier B, Chen AL-T, Kawakyu N, et al. Local public health resource allocation. Am J Prev Med 2013;45:769–75. 10.1016/j.amepre.2013.08.009 [DOI] [PubMed] [Google Scholar]
  • 2. Obermeyer Z, Lee TH. Lost in thought - the limits of the human mind and the future of medicine. N Engl J Med 2017;377:1209–11. 10.1056/NEJMp1705348 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Lehtonen H, Lukkarinen T, Kämäräinen V, et al. Improving emergency department capacity efficiency. Signa Vitae - A Journal In Intensive Care And Emergency Medicine 2016;12:52–7. 10.22514/SV121.102016.9 [DOI] [Google Scholar]
  • 4. Reynolds CJ. Better value digital health: the medium, the market and the role of openness. Clin Med 2013;13:336–9. 10.7861/clinmedicine.13-4-336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Second Edn. Springer: Springer Series in Statistics, 2017. [Google Scholar]
  • 6. Google. ‘Google Geocoding API’. 2017. https://maps.googleapis.com/maps/api/geocode/ (accessed Aug 2017).
  • 7. Velleman P, Welsh R. Efficient computing of regression diagnostics. The American Statistician 1981;4:234–42. [Google Scholar]
  • 8. Bellman R. A markovian decision process. Indiana University Mathematics Journal 1957;6:679–84. 10.1512/iumj.1957.6.56038 [DOI] [Google Scholar]
  • 9. Bertsekas D. Dynamic programming: deterministic and stochastic models. New Jersey: Prentice-Hall Inc, 1987. [Google Scholar]
  • 10. Howard R. Dynamic programming and markov processes. USA: The MIT Press, 1960. [Google Scholar]
  • 11. Puterman M. Markov decision processes: discrete stochastic dynamic programming. New York City: John Eiley & Sons, 1994. [Google Scholar]
  • 12. Lawler E, Lenstra J, Rinnooy Kan A, et al. The traveling salesmand problem: a guided tour of combinatorial optimization. New York: Wiley, 1985. [Google Scholar]
  • 13. Thorwarth M, Arisha A. Application of discret-event simulation in health care: a review. Dublin: Dublin Institute of Technology, 2009. [Google Scholar]
  • 14. Xu H, Wu W, Nemati S, et al. Patient flow prediction via discriminative learning of mutually-correcting processes. IEEE Transactions on Knowledge and Data Engineering 2016. [Google Scholar]
  • 15. Perlman SE, McVeigh KH, Thorpe LE, et al. Innovations in population health surveillance: using electronic health records for chronic disease surveillance. Am J Public Health 2017;107:853–7. 10.2105/AJPH.2017.303813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Kharrazi H, Chi W, Chang HY, et al. Comparing population-based risk-stratification model performance using demographic, diagnosis and medication data extracted from outpatient electronic health records versus administrative claims. Med Care 2017;55:789–96. 10.1097/MLR.0000000000000754 [DOI] [PubMed] [Google Scholar]
  • 17. Health Level 7 (HL7). FHIR Overview. 2017. https://www.hl7.org/fhir/overview.html (accessed 24 Oct 2017).
  • 18. Kasthurirathne SN, Mamlin B, Kumara H, et al. Enabling better interoperability for healthcare: lessons in developing a standards based application programing interface for electronic medical record systems. J Med Syst 2015;39. 10.1007/s10916-015-0356-6 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data

bmjstel-2017-000289supp001.pdf (4KB, pdf)


Articles from BMJ Simulation & Technology Enhanced Learning are provided here courtesy of BMJ Publishing Group

RESOURCES