Evaluation of Heuristics for the P-Median Problem: Scale and Spatial Demand Distribution

Harsha Gwalani; Chetan Tiwari; Armin R Mikler

doi:10.1016/j.compenvurbsys.2021.101656

. Author manuscript; available in PMC: 2022 Jul 1.

Published in final edited form as: Comput Environ Urban Syst. 2021 May 26;88:101656. doi: 10.1016/j.compenvurbsys.2021.101656

Evaluation of Heuristics for the P-Median Problem: Scale and Spatial Demand Distribution

Harsha Gwalani ^a, Chetan Tiwari ^b, Armin R Mikler ^c

PMCID: PMC8341018 NIHMSID: NIHMS1710716 PMID: 34366527

Abstract

The objective of the p-median problem is to identify p source locations and map them to n destinations while minimizing the average distance between destinations and corresponding sources. Several heuristic algorithms have been developed to solve this general class of facility location problems. In this study, we add to the current literature in two ways: (1) we present a thorough evaluation of existing classic heuristics and (2) we investigate the effect of spatial distribution of destination locations, and the number of sources and destinations on the performance of these algorithms for varying problem sizes using synthetic and real datasets. The performance of these algorithms is evaluated using the objective function value, time taken to achieve the solution, and the stability of the solution. The sensitivity of existing algorithms to the spatial distribution of destinations and scale of the problem with respect to the three metrics is analyzed in the paper. The utility of the study is demonstrated by evaluating these algorithms to select the locations of ad-hoc clinics that need to be set up for resource distribution during a bio-emergency. We demonstrate that interchange algorithms achieve good quality solutions with respect to both the execution time and cost function values, and they are more stable for clustered distributions.

Keywords: P-Median Problem, Heuristics, Location-Allocation, Spatial Distributions

1. Introduction

The p-median problem is the most common location allocation problem. Most location allocation problems involve the selection of a certain number of facilities or sources and mapping them to the demand points or destinations such that either coverage is maximized or distance is minimized. The “p-median” problem is a distance based optimization problem in which p facilities need to be located and assigned to the demand points such that each demand point is mapped to a single facility, and the sum of the weighted distance between all demand points and corresponding facilities is minimized. Applications of the p-median problem include location selection and demand distribution of warehouses [1] [2], shopping centers [3], fire stations [4][5] etc. While many heuristic and meta-heuristic approaches have been proposed for this well known NP-Hard problem, the effect of spatial distributions of the demand points on the performance of these algorithms has not been studied. This study analyses the sensitivity of existing algorithms to the scale or problem size, starting solution, and spatial distribution of demand points. The scale of the p-median problem refers to the number of demand points in the region and the number of facilities to be selected.

The problem of finding source or facility points such that the sum of distance between the sources and destinations is minimized can be traced back to 1600s. It has purely mathematical origins and began with the search for a single source point for three destinations. Fermat and Toricelli solved the geometric problem of finding a point for a triangle such that the sum of distances between this point and all three vertices is minimized [6]. This point is also called the geometric median or the Fermat point. Weber generalized this problem to n points and contextualized it to a location theory setting. The objective of the Weber problem [7] is to find the location of a warehouse such that the sum of the weighted distances between the destination points and the warehouse is minimized. The distances are weighted by the demand at the destination points. An extension of Toricelli’s solution for the Fermat point can be used to solve the Weber Problem for a triangle with weighted distances [8] [9]. For the general n-destination problem, iterative algorithms have been proposed by Kuenne and Kulin [10] and Weiszfeld [11]. Multiplicity of sources in the location-allocation problem was first introduced by Isard in [12] but the solution focused on reducing the problem to multiple single point location-allocation problems, instead of finding the solution simultaneously. Leon Cooper [13] examined the problem of simultaneous source determination. Cooper formalized the combinatorial optimization problem for locating facilities and presented an iterative algorithm based on Weiszfeld’s work to compute the exact solution. The time complexity to compute the exact solution is exponential, and is therefore computationally infeasible for problems with larger source and destination sets. Hakimi et al. present polynomial time algorithms for solving the p-median problem on a tree in [14] [15] [16]; the problem has been proven to be NP-hard on a general graph [17]. The mathematical problem definition and Cooper’s work are set in a continuous space where, the locations of the sources can have any coordinates in the region. Most location-allocation applications in the real world however, are set in discrete space. Facilities can be located at only certain predetermined candidate locations. In this study, we use the discrete network location allocation problem formulation in which the candidate set for the facilities is the set of destinations or demand points, hence the set of selected facilities is a subset of the demand points.

Linear Programming (LP) can be used to obtain exact optimal solutions to the p-median problem. The solution has integrality constraints on the decision variables because of the discrete nature of the problem. These integrality constraints are relaxed to obtain a solution using linear programming, and branch and bound techniques are used on the relaxed solution to obtain the optimal integral solution [18]. However, the complexity for these linear integer programming solutions is exponential therefore, heuristic methods are more commonly used in operations research. To overcome the huge search space of the location allocation problem, many heuristic approaches have been proposed in the literature. These approaches focus on obtaining a good solution instead of the most optimal solution. There are two types of heuristic algorithms, improvement algorithms and construction (or destruction) algorithms [18]. Improvement algorithms involve searching and exploring the neighborhood of an initial feasible solution to optimize it, while construction algorithms involve constructing a solution from scratch by starting with zero facilities and adding one facility in each iteration until p facilities have been selected. One of the earliest and simplest heuristic algorithm for the p-median problem is the alternate selection and allocation algorithm, proposed by Maranzana [19]. It is an improvement algorithm initialized by a random solution. Each facility in the current solution is replaced by the destination closest to the demand weighted centroid of destinations assigned to that facility. This process is repeated until no change in the solution is possible. This algorithm is quick and simple to implement, however it may not recover from a local optima.

Exchange based improvement algorithms have been shown to yield the best results out of all heuristic algorithms. Teitz and Bart proposed the original exchange or interchange algorithm [20]. Modifications to the original algorithm (Fast Interchange) to reduce the number of exchanges were proposed in [21] [22] [23]. Facilities currently in the solution are interchanged with facilities not in the solution one at a time and the solution after this interchange is evaluated. The interchange that results in the maximum decrease in the cost function is selected in each iteration. Each iteration involves (p)(n ‒ p) objective function evaluations even in the most optimal implementations, where n is the total number of destinations and p is the required number of facilities. The interchange algorithm has been shown to produce close to optimal results, but even the fast interchange algorithm may not be feasible for larger datasets because of the large number of pairwise exchanges. In order to avoid the evaluation for all pairwise exchanges between facilities in the solution and facilities not in the solution, Global/Regional Interchange algorithm [24] breaks the problem in two steps: global exchange step and local exchange step. The algorithm finds the best facility to remove from the current solution and then swaps it with the best facility from the non-selected set that should be added to the solution. This step is the global interchange step in the algorithm. The solution is further improved by interchanging each facility with the destinations mapped to it to complete the local exchange step. The number of interchanges involved in a single iteration for GRIA is n ‒ p+1 for each global step and n ‒ p for each local step. An iteration can have more than one global step. The algorithm has been shown to produce results as good as the Teitz and Bart and Fast Interchange algorithms in [24] but with significantly fewer number of interchanges.

Greedy Addition [25], also known as the myopic algorithm, is the most common construction heuristic for the p-median problem. Starting with no facilities, the destination that leads to the maximum decrease in the cost function is added to the current solution in each iteration. The total number of swaps for this algorithm is equal to $p n - (p^{2} - p) / 2$ .

Several meta-heuristic algorithms have also been proposed in the literature to improve the cost function values over those obtained from the interchange based algorithms. The most common meta-heuristics like genetic algorithms [26], tabu search [27] [28], heuristic concentration [29] and simulated annealing [30] [31] have been formulated and implemented for the p-median problem. Most of these meta-heuristic algorithms focus on improving the solutions obtained from local search or exchange algorithms with respect to the cost function value. The gains over the cost function value when compared to the classic exchange heuristics however, are not substantial. Since these algorithms widen the search space to avoid local optima, the number solutions that need to be evaluated is often higher than the evaluations in the exchange algorithm, the time required to obtain a global optimum is hence much higher. Additionally, all of these algorithms require at least one input parameter and a user defined stopping condition. The maximum number of iterations is usually used as the stopping condition for these meta-heuristics. A lower value of maximum number of iterations may not yield a good solution, and a higher value becomes time prohibitive. These input parameters may also change with the scale of the problem. This dependence on input parameters makes these algorithms less accessible and reliable for use by location analysts and GIS specialists. Additionally, most of the meta-heuristic algorithms need local search based heuristics either to generate good candidate solutions or to make a solution feasible. Table 1 illustrates the input parameters needed for the three most common meta-heuristics and their dependence on the classic heuristics in the proposed implementations in the literature. Due to the increased complexity of meta-heuristic approaches, and ambiguous time gains over heuristic algorithms, we exclude meta-heuristics algorithms in this study. Additionally, mathematical heuristics like Lagrangian relaxation [32][33][34], Lagrangian/Surrogate relaxation[35], column generation [36] have also been proposed in the literature to solve p-median problems more accurately and efficiently. These methods have been shown to be very effective for solving small scale problems, however as the problem scale increases, the memory requirements for solving even the relaxed linear programming problem increase and they become infeasible for solving large-scale problems. Although, Garcia et al. present Zebra (Z-Enlarge-and-BRanch-Algorithm) to solve large scale p-median problems optimally and efficiently in [37]. The authors start with a linear relaxation of a canonical representation of the p-median problem and add constraints to it sequentially to control the size of the problem. This study solves some known large-scale problems in the literature exactly for the first time, and improves the cost function of a few others. The largest problem solved in the paper has n, p > 80,000, however, this algorithm performs badly for smaller values of p and is restricted by the large memory requirement of the heuristic tool, Popstar[38]. We focus on the classic heuristics in this study because of two reasons, first, both mathematical and meta-heuristic algorithms depend on the local search based classic heuristics to an extent to obtain a solution to the p-median problem. Second, as we show in this paper, the interchange algorithms consistently produce solutions very close to the optimal solutions with respect to the cost function value. The Network Analyst extension in the ArcMap tool uses the Teitz and Bart algorithm to generate the initial set of good solutions that are further refined using a meta-heuristic [39]. These classic heuristics are still relevant for ongoing and upcoming research in this area.

Table 1.

Meta-Heuristic Algorithms for the P-Median Problem: Input Parameters and Dependence on Classic Heuristics

Algorithm	Input Parameters / Functions	Local Search Method
Genetic Algorithms	Population Size	Greedy Deletion [26]
	Maximum Iterations
	Crossover/ Mutation Operators
Simulated Annealing	Maximum Iterations	Exchange Algorithm [30]
	Temperature Decay Function
	Initial Temperature
Tabu Search	Tabu Tenure	Greedy Addition and Deletion [27]
	Maximum Iterations
	Diversification Strategy and Aspiration Criteria
	Slack Variable

Open in a new tab

While heuristic algorithms have been reviewed by researchers [40] [41] [18] [23] [42] [43] [44], a thorough comparison of the results based on the input parameters, scale and spatial distribution, of the given p-median problem is missing from existing literature. In this study, we present a detailed comparison which will help location analysts in making an informed choice when selecting a heuristic for a problem and may provide insights and ideas towards improving these algorithms for large-scale problems. This paper presents the first study (to the best of our knowledge) on the effect of spatial distribution of demand points on the performance p-median heuristic algorithms. The effect of the input parameters on the performance of the heuristic algorithms is analyzed using statistical significance testing. Lastly, the paper demonstrates the utility of this evaluation in real world by using selection of points of dispensing locations for a region in preparation of a bio-emergency as an example p-median problem. These locations need to be setup in such a way that all affected individuals should be able to receive vaccines or other medical resources in a timely manner during an emergency. This evaluation aids the public health planners in selecting the right heuristic for them considering the trade-off between efficiency and quality for the region at hand.

1.1. Problem Formulation

The non-capacitated discrete p-median for minimizing the demand weighted distance between the destinations and corresponding sources is a combinatorial optimization problem. It can be formulated as:

Given:

Set of Destinations: D ={d₁,d₁,…d_n}

Each destination, d_i has coordinates d_i(x), d_i(y) and demand demand (i)

Number of required facilities: p

Select p facility locations, X ={x₁,x_2…x_p} out of the n destinations ( X ⊂ D) and map each destination to a facility such that,

For each destination d_i ϵ D,

\sum_{j = 1}^{p} α_{i j} = 1

and

the objective function cost(X),

$c o s t (X) = \sum_{i = 1}^{n} \sum_{j = 1}^{p} α_{i j} (d e m a n d (i) d i s t a n c e (d_{i}, x_{j}))$ is minimized.

α_ij is a binary parameter which denotes the mapping of destination d_i to facility x_j , if α_ij is set to one, it implies that destination d_i has been mapped to facility x_j. distance(d_i,x_j ) is the distance between destination d_i and facility x_j . Euclidean distance was used in this study but it can be replaced with other distance measures if necessary. Any solution X is evaluated based on the value of the objective function, cost(X) after each destination is mapped to its closest facility in X . We evaluate the algorithms used to obtain X based on three metrics, cost(X), time required to obtain X , and the stability of the algorithm corresponding to the input data. An algorithm is said to be stable if it produces the same cost function value or similar cost function values with multiple executions for a constant input. We measure stability as the interquartile range of the cost function values obtained for a algorithm across multiple runs.

Given this problem formulation and in line with the aim of this study, we want to answer the following two research questions:

Which p-median heuristic algorithm yields the best solution for a given problem? The quality of a solution is measured with respect to three metrics, the objective function value, the time required to compute the solution, and the stability of the solution.
Are the p-median heuristic algorithms sensitive to the spatial distribution of destinations?

2. Methodology

The heuristic algorithms reviewed and compared in this study are described briefly in this section. For all algorithms discussed below, destinations are always assigned to the closest facility in the current solution set X .

2.1. Alternate Selection and Allocation (Maranzana’s Algorithm)[19]

This heuristic improves on the current solution by exploring its local neighborhood in the search space. It is the fastest heuristic to get a feasible solution but it is highly sensitive to the initial solution and vulnerable to termination at a local optimum.

Select p locations from the n destinations randomly as the initial solution X .
For each facility x_j in X:
1. Compute the demand weighted centroid, cen_j, of all destinations assigned to x_j.
2. Select the destination closest to cen_j , $d_{k}^{*}$ as the replacement for x_j, $X' = X - x_{j} + d_{k}^{*}$ .
If X′ is not equal to X, X = X′, reassign destinations and repeat from step 2.
Else terminate.

Steps 2 and 3 in the algorithm complete a single iteration of this algorithm. The reassignment of destinations in step 3 assigns the destinations to their closest facility in X .

2.2. Exchange Algorithm[21][22]

Goodwin’s and Noronha’s modified version of the interchange algorithm was used in this study and is described below:

1
Select p locations from the n destinations randomly as the initial solution X .
2
Let X_min = X .
3
For each facility x_j in X:
1. For each location x_k in D‒X :
  
  Let X′ = X ‒ x_j + x_k.
  
  If cost (X′) < cost (X _min): X _min = X′.

4
Terminate if X is equal to X_min .
5
Else, X = X_min , reassign destinations and repeat from step 3.

Step 3 in the algorithm comprises a single iteration in the algorithm. In the original algorithm proposed by Teitz and Bart [20], the facility set X was changed after any swap that resulted in a decrease in the cost function. Goodchild and Noronha proposed the above variation in which the current facility set is changed after the best exchange candidates have been discovered. Whitaker[22] improved this algorithm further by proposing an efficient implementation for evaluating the swaps. He observed that all destinations need not be reassigned after a swap of a facility in the solution with a facility not in the solution (in step 3a and 5). The destinations that were previously assigned to the outgoing facility definitely need to be reassigned. Additionally, destinations for which the distance to their current source is greater than their distance to the incoming facility need to be assigned to the new facility. This optimal reassignment has been shown to reduce the run time of the algorithm considerably and explained in [22] and [23].

2.3. Global/Regional Interchange Algorithm[24]

The global/regional interchange algorithm is another interchange based algorithm that first swaps the worst facility in the current solution with the best facility not in the solution. The solution is further improved by swapping every facility in the current solution with the destinations assigned to it in each iteration.

Select p locations from the n destinations randomly as the initial solution X .
Global Swap:
1. Select the facility x_j from X whose deletion leads to the minimum increase in cost. Evaluate X ‒ x_j . Removing a facility will lead to an increase in the cost.
2. Select the facility x_k from D ‒ X whose addition to X ‒ x_j leads to the maximum decrease in cost. Evaluate X ‒ x_j + x_k .
3. If cost(X ‒ x_j + x_k ) < cost(X), X = X ‒ x_j + x_k and repeat the global swap.
4. else go to the local swap step.
Local Swap
1. Let X _global = X .
2. For each facility x_j , evaluate X ‒ x_j + d_i for all destinations d_i mapped to x_j and swap x_j with the destination $d_{i}^{*}$ that minimizes the cost, $X = X - x_{j} + d_{i}^{*}$ .
3. If X is not equal to X _global , reassign destinations and repeat from global swap step, else terminate.

The number of global swaps in the algorithm is equal to the number of times X is replaced with a better solution in step 2, and similarly the number of local swaps is equal to the number of times X is changed in step 4 across a single run of the algorithm.

2.4. Greedy Addition/ Myopic Algorithm[25]

The greedy addition is a constructive algorithm unlike the improvement algorithms discussed above. Improvement algorithms start with an initial feasible solution to the p-median problem and the improve the current solution by a local search method iteratively. Construction based algorithms on the other hand, build the solution from an empty set by adding a facility in each iteration. Only the final iteration of construction based algorithms results in a feasible solution to the problem. The greedy addition algorithm adds a facility to the solution set X in each iteration until p facilities have been added. The steps are described below.

Start with X =ø.
Let X_min = X .
For each destination d_i in D and d_i not in X .
1. Evaluate X′ = X + d_i .
2. If cost(X′) < cost(X_min): X_min = X′.
X = X_min and reassign destinations.
If | X |< p, repeat from step 2.

The number of iterations is equal to p for greedy addition as step 3 is executed p times.

3. Experiments

In order to evaluate the heuristics with varying scale and spatial distributions, we created synthetic datasets with known distributions, and number of destinations (n). The required number of facilities p is varied from 2 to 2^k such that 2^k <= n/4 , for k =1,2…,logn‒2 . As p approaches n/2, the problem becomes an elimination problem instead of a selection problem hence we restricted p to be less than or equal to n/4 to get a large enough value of p for a selection problem. For each problem set, we conducted 100 runs for Maranzana, exchange and GRIA, each initialized with a random solution. Additionally, these algorithms are compared with hybrid versions of Maranzana, exchange, and the Global/Regional and Interchange algorithms with the myopic or greedy addition. The output obtained from the myopic algorithm is used as the initial solution in these hybrid versions of the improvement algorithms.

3.1. Spatial Distributions

We study three kinds of distributions:

Random or Homogeneous Distribution: The demand points are distributed randomly in the region. (Figure 1a)
Centered Distribution: Majority of the demand points are located around a center, the rest are distributed randomly in the region (Figure 1b).
Clustered Distribution: Multiple high-demand clusters exist in the region with the remaining demand points scattered throughout (Figure 1c).

Figure 1: — Spatial distribution types for demand points a) Random Distribution b) Centered Distribution c) Clustered Distribution. The performance of the classic heuristic algorithms was analyzed on datasets with different spatial distributions of demand points.

3.2. Scale

The scale of a given p-median location allocation problem includes both the number of destinations n and the number of facilities p . In the synthetic datasets, the number of destinations were varied by changing the total demand in the region. All demand units were scattered in the region grid depending on the required spatial distribution. These demand units are then combined to create destinations. Each destination is assumed to have a demand selected from a normal distribution, $N (1500, 400)$ . These values were chosen to replicate the creation of census block groups, a census geographic unit in USA. The centroid of the merged demand units for a destination defines the coordinates of the destination. Table 2 shows the total demand and the corresponding range for number of destinations (n) and number of facilities for the synthetic datasets used in this study.

Table 2.

Synthetic Datasets: Variation in Scale. The table shows the range of demand points and the range facilities to be selected corresponding to three classes of datasets. Each class has its own random, centered and clustered distribution.

Total Demand	# of Nodes	# of Facilities
500,000	217–330	2–64
800,000	414–528	2–128
2,000,000	480–1,250	2–256

Open in a new tab

3.3. Synthetic Data Creation

In order to create the synthetic block groups, the total population, N , in the region is first distributed on a fine scale grid such that a cell in the grid is the smallest unit of space that an individual can occupy. These N individuals are first assigned to cells on the grid. This assignment is performed based on the desired distribution. The cells are then aggregated to create the demand points that are used in the p-median problem. These steps are described in detail in the following sections.

3.3.1. Distribution of Population

A two-dimensional grid is created such that each cell in the grid represents the smallest unit of space that an individual or demand unit can occupy. Each cell is represented by their bottom left corner point. Figure 2 shows an example grid with these points. Individuals are assigned to grid cells depending on the desired distribution. A region with homogeneous distribution of population can be simulated by assigning an individual to a cell by randomly selecting a cell such that each cell has an equal probability of being selected. The process of introducing heterogeneities (dense regions) or population clusters in the region is described below.

Figure 2: — Synthetic data grid: The smallest unit cell/point that an individual can occupy in the synthetic data. The demand units are distributed on this grid in a homogeneous/heterogeneous manner depending on the the desired distribution to create different demand point distributions.

In order to evaluate the effect of spatial distribution of demand points on the p-median heuristics, we create synthetic clusters of demand points in the region. This is achieved by distributing the underlying population within the region across different clusters. Let r be the proportion of population that exists in the clusters, and c be the total number of clusters. We assume that the clustered populations occupy s fraction of the total area, A , equal to sA . Therefore, each cluster is created in a region with area sA/c and contains rN/c individuals. The population within the cluster is distributed in two ways. The population for each cluster rN/c can either be distributed randomly in the corresponding square with area sA/c (Figure 3a), or a cluster may have a higher proportion of population αrN/c near the center of the cluster in an inner square of area sA/4c, while the rest of the population (1‒α)Nr/c is distributed randomly in the outer ring of the cluster (Figure 3b). Figure 3 shows examples of both types of population distribution within a cluster for c =1, r =80%, i.e., 80% of the entire population is distributed within the cluster. The cluster occupies 6.25%, ( s =1/16) of the total area in the Figure 3a and the population is distributed randomly within the cluster. In Figure 3b, the cluster occupies 25% of the total area (s =1/4) , but 62.5% of the total population in the cluster is within the inner square, and the remaining population ( for the cluster, 17.5% of the total population) is distributed in the outer ring. The pseudo code for a clustered population distribution is presented in Algorithm 1 in the Appendix section. There is no significance of creating layered clusters or completely homogeneous clusters. The synthetic datasets are categorized into 3 classes for the purposes of this evaluation, 1) Random 2) Centered c =1, and 3) Clustered. The different types of clusters are meant to increase the diversity in the degree of clustering, they are not treated as different types of distributions.

Figure 3: — Cluster creation: Types of clusters in synthetic data. The population within a cluster may be homogeneously distributed(a) or may be denser around the center(b).

3.3.2. Creation of Demand Points

After the population (or any demand unit) has been distributed to the grid cells, grid cells are combined together to create the demand points that will be used in the p-median problem. This combination process begins with the initialization of the first destination or demand point. The total demand units to be allocated to a destination is determined based on the normal distribution, $N (1500, 400)$ at initialization. The first grid cell is assigned to the first destination. More grid cells are selected for the current destination based on their availability, and distance from the current centroid of the destination. A new destination is initialized if the demand at the current destination exceeds or becomes equal to the demand determined during initialization. The algorithm is described in detail in the Appendix section (Algorithm 2). This process of distributing the population or demand units first, and then aggregating the demand units to create the actual demand points is simulating the creation of census block groups or similar demographic census regions. Synthetic clustered and/or homogeneous demand points can be created directly by distributing demand points (homogeneously/heterogeneously) on a wider grid to create similar datasets. Another method to induce clusters could be creating multiple Gaussian subregions about a center (mean) with varying covariances in the area. The process used to create the synthetic datasets is not critical for the study, the focus is the effect of clusters on the performance of p-median heuristics which is independent of the type of clusters as demonstrated by the real datasets.

4. Results

The heuristic algorithms are evaluated with respect to three metrics, 1) cost function: the value of the objective function of the optimization problem, 2) time and iterations: the time taken by the algorithm to achieve a solution, and the number of iterations executed in each run (if applicable), and 3) stability: the interquartile range (IQR) of the cost function across multiple runs of an algorithm on the same dataset. The myopic algorithm is the most stable as it always yields the same result, the exchange algorithm, GRIA and alternate selection and allocation heuristics, however are not deterministic. If the improvement algorithms are initialized with a random solution, the final solution cannot be determined based on previous results. While the randomization of initial solutions helps in avoiding local optima however if the variance in the cost function values across multiple executions is high then the algorithm would require many executions to obtain a decent solution. Therefore, higher stability is a desirable metric for the algorithm. The sensitivity of the heuristics to scale and spatial distributions was quantified by their effect on these three metrics.

4.1. Cost Function

As can be seen in Figures 4, 5 and 6, the interchange algorithms (exchange and GRIA) perform the best with respect to the cost function across all distributions and for all values of n and p . The figures show the variation in the relative cost function with respect to the best known solution for the problem. The best known cost function value was calculated as the minimum of values computed using a Mixed Integer Linear Programming(MILP) solver¹ (time limit = 100 hours), and cost values corresponding to the heuristic algorithms across 100 runs. MILP was able to compute the optimal solution for almost all problem sets. The optimal solution is not known for problems marked with the red asterisk on the y-axis.

Figure 4: — Synthetic Dataset I: Cost function comparison. Total population for dataset I is 500,000, n = 330 for the random distribution, n = 301 for centered distribution, and n = 217 for the clustered distribution. The figure shows the cost function values relative to the optimal solution for the four algorithms (stochastic and hybrid) for p ={2,4,8,16,32,64}.The hybrid results are the horizontal lines next to the corresponding box plots. The stochastic version of the improvement algorithm is initialized with a random solution, while the hybrid version is initialized with the myopic solution.

Figure 5: — Synthetic Dataset II: Cost function comparison. Total population for dataset II is 800,000, n =528 for the random distribution, n =509 for centered distribution, and n = 414 for the clustered distribution. The figure shows the cost function values relative to the optimal solution for the four (stochastic and hybrid) for p = {2,4,8,16,32,64,128}.The hybrid results are the horizontal lines next to the corresponding box plots. The stochastic version of the improvement algorithm is initialized with a random solution, while the hybrid version is initialized with the myopic solution.

Figure 6: — Synthetic Dataset III: Cost function comparison. Total population for dataset III is 2,000,000, n =1,250 for the random distribution, n =1,058 for centered distribution, and n = 480 for the clustered distribution. The figure shows the cost function values relative to the optimal solution for the four algorithms(stochastic and hybrid) for p = {2,4,8,16,32,64,128,256}. The optimal solution could not computed for the red asterisk marked problems by the Mixed Integer Programming solver within 100 hours. The hybrid results are the horizontal lines next to the corresponding box plots. The stochastic version of the improvement algorithm is initialized with a random solution, while the hybrid version is initialized with the myopic solution.

The exchange algorithm is known to yield close to optimal location selections but as these results show that GRIA performs almost as well at least for small and mid range values of p. The median and minimum cost function values for GRIA are comparable with those for the exchange algorithm. We performed the one-tailed Mann-Whitney U Test [45] to test for significant difference between the cost function values across the 4 algorithms. The Mann-Whitney rank test can be used to test if a randomly selected value from one population distribution is larger (or smaller) than a random value from another population distribution. The p-values obtained for the hypothesis test that the distribution from method I is greater than the distribution obtained for method II are shown in Tables A1, A2 and A3 in the Appendix section for the three sets of synthetic datasets. The results showed that the distributions for Maranzana were greater than the distributions for GRIA and the fast interchange exchange algorithm across all datasets for all values of p , except for smaller values of p . This implies that a run of Maranzana for a given problem set is likely to yield a solution with a higher cost function value when compared to cost function values for solutions produced by executions of GRIA or fast interchange for the same problem set. The myopic distribution (single value) seemed to be greater than the Maranzana distribution for smaller values of p and fared better for larger values of p specifically for clustered and centered distributions. The myopic solutions however were higher than the interchange based algorithms for all problem sets. While the median and mean cost function values for GRIA are highly comparable with those obtained from the exchange algorithm across all problems, significance testing showed that GRIA distributions are greater than the fast interchange distributions particularly for larger values of p .

Figures 4, 5 and 6 also show the results obtained for the hybrid version of each improvement algorithm. As discussed earlier, the hybrid version for each improvement algorithm involves initializing the algorithm with the myopic solution as the initial source configuration instead of randomly selected destinations. This strategy makes the improvement algorithms deterministic and stable. The hybrid results are the shown by the horizontal line of the same color instead of the box plots in the figures. It can be seen that the orange horizontal lines (Maranzana hybrid) are lower than the median lines in the orange box plots (Maranzana) across all datasets and all values of p , while any such relation is difficult to infer for the exchange algorithm (purple lines/box plots) or GRIA (blue lines/box plots). Further statistical testing to compare the objective function values from the stochastic version with the hybrid version corroborated the results shown in Figures 4, 5 and 6. The Mann-Whitney U rank tests were used to test if the stochastic cost population is higher than the hybrid cost population. These tests were inconclusive across all datasets for GRIA as no general trend could be deduced, while the rank tests for the exchange algorithm showed that the hybrid version was larger for more values of p than the stochastic version, specially for clustered datasets. Maranzana hybrid was significantly better than the stochastic version but still significantly worse than the results produced by the interchange based algorithms (stochastic/hybrid).

Effect of scale and spatial distribution:

The interchange algorithms, exchange and GRIA perform consistently good across all spatial distributions, and all values of n and p . The minimum cost function values obtained for these algorithm across all runs is equal or very close to the optimal cost function value obtained using the MILP solver. Significance testing proved that GRIA yields poorer results for larger values of p when compared to the distributions from the exchange algorithm. The distributions are not significantly different for smaller and mid-range values of p (Tables A1, A2 and A3). Therefore, it can be concluded that performance of GRIA deteriorates with increase in p specifically for larger values of p.
The myopic algorithm yields a better cost function value for centered and clustered distributions for all values of n and p relative to random distributions. For any distribution, it yields better results for larger values of p . Figure 5 illustrates the problem with using the myopic algorithm for smaller values of p . The first facility selected by the myopic algorithm is far away from the best solution facility. Hence, the myopic algorithm may be used to obtain good quality solutions for clustered and centered distributions particularly for larger values of p , p ≈ n/4 .
The alternate selection and allocation algorithm yields close to optimal solutions across multiple runs for smaller values of p , however it is the most prone to terminate in a local optimum because of the algorithm structure. The location of a facility is always replaced with the demand weighted centroid of the destinations assigned to the current facility, if the current facility is already the demand weighted centroid then the algorithm does not explore farther neighborhoods. Hence facilities assigned within a cluster in a clustered distribution usually move only within that cluster which results in poorer results. The centroids change more globally in random distributions, hence the algorithm performs better for random distributions. Figure 8 shows the percentage decrease in the cost function with each iteration for all three distributions and it can be seen that the slope becomes almost constant for clustered and centered distributions after the first or second iteration. To summarize, the alternate selection and allocation algorithm should not be used for large scale problems, particularly for centered or clustered distributions. However, it can be used for problems with p <10 to obtain solutions close to exchange algorithms particularly for random distributions.

Figure 8: — Performance of Maranzana with varying distributions. The y-axis shows the percent decrease in cost function in each iteration. It can be seen that after the initial jump, the decrease is almost zero for the clustered distribution indicating that the algorithm has found a local optimum.

4.2. Time and Iterations

In order to evaluate the performance of the algorithms with respect to execution time, we compared the average time taken by each heuristic to produce a solution with different scales and spatial distributions. To study the impact of variation in spatial distribution on execution time, we created datasets with constant values of n across different spatial distributions. Figure 9 shows the average absolute time taken by each heuristic for different values of p across three datasets corresponding to n = 217 , n = 414 and n = 483 respectively for these datasets. Figure 10 shows the execution time for the actual datasets listed in Table 2 for the stochastic and hybrid versions of the heuristic algorithms. The results showed time(exchange) ≈ time(GRIA) ≈ time(myopic) >> time(Maranzana) for smaller values of p but time(myopic) > time(exchange) > time (GRIA) >> time(Maranzana) for the larger values of p in these datasets. Interestingly, seeding the improvement algorithms by the myopic solution shifted the time plot (dotted lines in Figure 10) upwards by the time need to compute the myopic solution (green solid line). Therefore, initializing the algorithm with the myopic solution does not decrease the execution time for improvement algorithms when compared to the average time across the stochastic runs.

Figure 9: — Variation in run time (in seconds) with spatial distribution, number of demand points (n) and number of facilities( p ). a) Dataset I: n = 217, p = {2,4,8,16,32,64} b) Dataset II: n = 414, p = {2,4,8,16,32,64,128} c) Dataset III: n = 483, p = {2,4,8,16,32,64,128}. It can be seen that Maranzana is the fastest algorithm while the exchange algorithm is the slowest for small-mid-scale problems and myopic is the slowest for larger scale problems. Execution time increases with increases in number of demand points and number of facilities. Variation in spatial distribution does not impact the average execution time.

Figure 10: — Variation in run time (in seconds) with spatial distribution, number of demand points (n) and number of facilities( p ). a) Dataset I: n = 217, p = {2,4,8,16,32,64} b) Dataset II: n = 414, p = {2,4,8,16,32,64,128} c) Dataset III: n = 483, p = {2,4,8,16,32,64,128} . The dotted lines show the execution time for the corresponding hybrid version. It can be seen that hybrid versions of the improvement algorithms are not faster than the stochastic versions on average.

Effect of scale and spatial distribution:

The average time taken by all algorithms for different distributions with constant n and p is not significantly different as can be seen in Figure 9. Therefore, the average absolute time taken per execution of each algorithm is independent of the spatial distribution but increases with the value of n and p .
The time required or the number of iterations for each run for these algorithms may not be constant across different runs. As can be seen in Figure 11, the variation in number of iterations across different runs specially for the exchange algorithm and Maranzana is notably high. However, the time to a execute a single iteration of Maranzana is so small that the increase in number of iterations does not increase time to produce a result significantly. But the time required to converge to a solution is highly dependent on the initial configuration for the exchange algorithm. However, using the myopic solution as the initial configuration in the exchange does not yield in a considerable decrease in the number of iterations or execution time as shown in Figure 10 in general.
The number of iterations required for termination of GRIA is not dependent on scale, the number of local and global swaps within each iteration increase with scale instead. Moreover, the total number global swaps and local swaps required across all iterations for termination in GRIA change with the change in spatial distribution of destinations. Figure 12 illustrates this variation. The number of global swaps increases as the level of clustering increases while the number of local swaps decreases significantly as the distribution changes from a random distribution to a centered and clustered distribution.

Figure 11: — Variation in number of iterations with spatial distribution and number of facilities( p ). The variation in number of iterations across different runs for the exchange algorithm and Maranzana is notably high(wider box plots). The number of iterations is not affected by the initial solution for GRIA.

Figure 12: — GRIA: Average global and local swaps required across all iterations for different spatial distributions. Left: Dataset I: n = 217, p = {2,4,8,16,32,64} . Center: Dataset II: n = 217, p = {2,4,8,16,32,64,128} . Right: Dataset III: n = 480, p = {2,4,8,16,32,64,128 . Average global swaps are higher for clustered distribution across all three datasets, and average local swaps are lower for clustered distribution across all three datasets when compared with other distributions for constant values of n and p

4.3. Stability

The final evaluation metric for these heuristic algorithms is stability. We define stability as the measure of variation across different runs for a constant dataset and a constant value of p . Stability is quantified by the interquartile range (IQR) of the cost function (Figures 4, 5, 6). The myopic algorithm is the most stable as solution is deterministic across all runs. The other three algorithms are dependent on an initial solution that acts as the seed for each execution. The cost functions corresponding to the alternate selection and allocation algorithm have the widest box plots for all three datasets specifically for larger values of p , therefore it is the most unstable, and hence requires the most number of runs to achieve a decent solution. The exchange algorithm is more stable than GRIA, hence a single or very few runs may suffice for an exchange algorithm solution. Table 3 shows the maximum size of IQR observed for the three datasets across all values of p corresponding to different spatial distributions.

Table 3.

Stability: Maximum Size of Inter Quartile Range (IQR) observed across all problem sets corresponding to all distributions of the three synthetic datasets and all values of p .

Algorithm	Random	Centered	Clustered
Myopic	0%	0%	0%
Maranzana	3.99%	5.20%	34%
GRIA	0.97 %	1.6%	3.82%
Exchange	0.99%	0.71%	2.61%

Open in a new tab

Effect of scale and spatial distribution:

The IQR was observed to be 0% for the exchange algorithm and GRIA (Figures 4, 5, 6) for almost all datasets for p ≤ 8 . Therefore we conclude, the interchange algorithms are more stable for smaller values of p than for larger values of p .
It was observed that the median solution for exchange algorithm was more consistently the best solution or very close to the best solution for clustered distributions. The median value is more likely to be greater than the best solution for random distributions because there are more local optima for a randomly distributed destination dataset, while all starting configurations are likely to converge to the same solution for clustered distributions. Figure 13 illustrates this difference for the cost function across different runs for a random dataset versus a clustered dataset. The solution band for the clustered dataset is narrower than the random dataset. To summarize, more executions are needed to obtain a good solution in case of a random distribution versus a clustered distribution for the exchange algorithm.
The size of IQR is lower on average for centered datasets for all algorithms. The most stable results can be obtained for a centered distribution using any algorithm relative to other distributions because the difference between two solutions for a centered distribution is likely to be less than the difference between two solutions in the other distributions as most destinations are located close to each other. Hence, fewer runs may suffice for centered distribution without major improvement in the cost function value for centered distributions.
Maranzana is highly unstable for clustered distribution because it is more prone to fall in a local optimum in a clustered distribution (Figure 8). Therefore, Maranzana may need many executions in case of a clustered distribution to get a decent solution.
The interchange algorithms in general are very stable for clustered distributions for mid range and larger values of p , for which the median value is 100% and | IQR |< 0.5%, the high instability seen in Table 3 corresponds to p = 2. Hence, multiple runs of interchange algorithms may not improve the quality of the solution for clustered distributions.

Figure 13: — Exchange Algorithm: Variation in cost function across multiple runs. The exchange algorithm was seen to be more stable (narrower cost band) for a clustered distribution versus a random distribution.

4.4. Case Study: Resource Distribution during Bio-Emergencies

To further test our results on the quality of solutions, and the impact of scale and spatial distributions on real datasets, we experimented with Texas DSHS Health Regions[46]. Points of Dispensing (PODs) locations need to be identified in preparation for bio emergencies. These service centers are used to dispense medical supplies or prophylactics to the affected population. Specifically, we selected health regions 5N, 6 and 11 because of suitable spatial distribution of census block group centroids in these regions. Figure 14a shows these regions in the state of Texas and the distribution of census block group centroids in the three regions. Region 5N is almost randomly distributed with two small clusters. Region 6 is a centered distribution with a big cluster in the center and smaller clusters scattered in the periphery. Region 11 is a clustered distribution with sparse demand points scattered around the clusters. Table 4 shows the total demand (population), number of destinations(census block groups) and the range for number of facilities corresponding to these regions.

Figure 14: — Real Datasets and Corresponding Results: Time and Cost Comparisons for the three Texas DSHS Regions

Table 4.

Texas Health Regions: Variation in scale. The table shows the number of demand points and the range facilities to be selected in the three health regions to analyze the performance of the p-median heuristics.

Name of Region	Total Population	# of Block Groups (destinations)	# of Facilities (PODs)
Region 5N	380,459	267	2–64
Region 11	2,229,255	1,193	2–256
Region 6	6,806,113	3,148	2–512

Open in a new tab

Figures 14b shows the distribution of cost function values relative to the best known solution and the average execution time for each experiment for the problem sets listed in Table 4. The maximum size of IQR for all regions and all values of p are shown in Table 5. These results show similar trends for the cost function values obtained using the four heuristic algorithms as were observed for the results for synthetic datasets. The exchange algorithm is the best when evaluated for the cost function metric and is more stable for clustered and centered distributions. GRIA is a close second to the exchange algorithm but shows poorer results for higher values of p. Myopic yields better results for Region 6(centered) as compared to Regions 5N and 11. Maranzana is highly unstable across all the datasets but seems to be the most unstable for Region 11 (clustered). The average execution time analysis for these datasets showed that the exchange algorithm becomes more time prohibitive than the other algorithms as the values of n increases thus, for problems with larger n , the general trend is time(exchange) > time(myopic) ≈ time(gria) >> time (maranzana) .

Table 5.

Case Study Results, Stability: Maximum Size of IQR observed across the three regions for all values of p . It can be seen that Maranzana is most unstable for Region 11(clustered), and Global Regional Interchange Algorithm (GRIA) is most stable for the clustered region 11

Algorithm	Region5N	Region6	Region11
Myopic	0%	0%	0%
Maranzana	10.35%	5.24%	36.49%
Exchange	1.77%	0.93%	0.55%
GRIA	1.86%	1.16%	1.43%

Open in a new tab

5. Conclusion

A thorough review and evaluation of the performance of four heuristic algorithms: exchange algorithm, GRIA, alternate selection and allocation, and myopic algorithm was presented in this paper. The performance was evaluated with respect to three metrics 1) the value of the objective function 2) time required to obtain a solution and 3) stability of the solution. Additionally, we show the impact of variation in spatial distribution of demand points and scale of the problem on the performance of these algorithms using statistical testing. The synthetic datasets and corresponding results can be used as a reference to evaluate other algorithms.

It was shown that GRIA is the best algorithm for all problems irrespective of scale and spatial distribution to obtain reasonably good stable solutions in less time than the exchange algorithm which has been known to produce close to optimal solutions. The results are comparable for GRIA and the exchange algorithms with respect to the three metrics for all datasets specially for small and mid-range values of p . The interchange algorithms require more executions with different initial solutions for random distributions when compared to clustered or centered distributions because of higher variation in local optima. The myopic algorithm is not suitable for selection of few facilities but may be used for larger values of p specifically for highly clustered distributions. The alternate selection and allocation algorithm (Maranzana) is the fastest way to obtain feasible solutions but performs poorly with respect to both cost and stability as the scale of the problem increases. It was shown to be more unstable for clustered distributions than centered or random distributions hence may require more executions for clustered distributions. The synthetic datasets created for this research included clustered distributions that had uniform densities across all clusters, more advanced variations of spatial distributions can be created using Gaussian Mixture Models (GMMs). Examples of these distributions include a region with two sub-regions of randomly distributed demand points with different mean densities across the sub-regions, and multiple clusters in the region with different mean densities and sizes. We validated the results using real datasets as a proxy for these advance variations. Additionally, we observed that the Coin-or branch and cut [47] MILP solver that we used to obtain exact solutions was very effective for small to medium scaled problems, therefore we recommend using MILP if the problem size is small. Commercial MILP solvers like GUROBI[48] and CPLEX may improve the execution time further for the MILP solutions for larger-scale problems. The results shown in this study do not evaluate the performance for these commercial solvers.

To avoid executing multiple runs and achieve stability in improvement algorithms, we also experimented with the hybrid algorithms combining the myopic, and improvement algorithms. Statistical analyses on the results obtained from the hybrid versions showed that hybrid Maranzana is better than stochastic Maranzana with respect to the cost function values, while the results were mostly inconclusive for the interchange algorithms. The stochastic version of fast interchange performed better than the hybrid version in case of clustered distributions for a larger number of problems. The results for the hybrid interchange algorithms are comparable to the median solutions obtained from the stochastic versions therefore, the gains in terms of the cost function are not compelling. However, the time to obtain a myopic solution for large scale problems, (n > 1,000 and p > 100 ) is prohibitive for this approach to be useful in large scale scenarios, specifically for clustered distributions as a single run of the interchange algorithm may be sufficient.

The results in this study show the heuristic algorithms work well for small to mid scale problems and provide a guideline for the number of executions suitable for a given algorithm and problem set to achieve good solutions. However, the average execution time required to solve problems with n > 3,000 and p > 500 exceeds 2 hours for both the exchange algorithm and GRIA. Hence, there is a scope to either develop new algorithms or modify existing algorithms to reduce the execution time for large scale problems. The effect of clustering on the number of global and local swaps for GRIA suggests that the problem can be decomposed in multiple independent problems that can be solved concurrently to reduce run time particularly for clustered distributions. There is ongoing research on utilizing spatial distributions to solve large-scale p-median problems [49] [50].

Supplementary Material

NIHMS1710716-supplement-1.pdf^{(3.1MB, pdf)}

Figure 7: — Facility allocation by the myopic algorithm for p = 2 compared with the optimal solution. The myopic algorithm identifies the facility in the center as the solution to the 1-median problem, and then adds the second facility.

HIGHLIGHTS.

This paper is the first study that evaluates the performance of heuristic algorithms for solving p-median problems with respect to the spatial distribution of demand points.
The algorithms are evaluated based on three metrics, cost function value, execution time and stability. The stability/consistency of algorithms is often not discussed in the literature.
The study demonstrates the utility of these evaluations by using the algorithms for selecting the locations of ad-hoc clinics for resource distribution during bio-emergencies. These results will help location analysts in making an informed choice when selecting a heuristic for a problem and may provide insights and ideas towardsa improving these algorithms for large-scale problems.
The algorithms are evaluated using both synthetic and real datasets. The implementation of the algorithms, input datasets and output datasets will be made available via a public git repository (on request before acceptance).These datasets can be used to test and compare the performance of other existing or novel algorithms.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Python MIP (Mixed-Integer Programming) Tools: https://pypi.org/project/mip, Uses the COIN-OR Branch-&-Cut CBC solver

References

[1].Baumol WJ, Wolfe P, A warehouse-location problem, Oper. Res 6 (2) (1958) 252–263. doi:10.1287/opre.6.2.252.URL10.1287/opre.6.2.252.URL http://dx.doi.org/10.1287/opre.6.2.252http://dx.doi.org/10.1287/opre.6.2.252 [Google Scholar]
[2].Dejax PJ, A methodology for warehouse location and distribution systems planning, in: Bianco L, La Bella A (Eds.), Freight Transport Planning and Logistics, Springer Berlin Heidelberg, Berlin, Heidelberg, 1988, pp. 289–318. [Google Scholar]
[3].Nwogugu M, Site selection in the us retailing industry, Applied Mathematics and Computation 182 (2) (2006) 1725–1734. doi:10.1016/j.amc.2005.12.050.URL10.1016/j.amc.2005.12.050.URL http://www.sciencedirect.com/science/article/pii/S0096300306001251http://www.sciencedirect.com/science/article/pii/S0096300306001251 [Google Scholar]
[4].Plane DR, Hendrick TE, Mathematical programming and the location of fire companies for the denver fire department, Operations Research 25 (4) (1977) 563–578.URL http://www.jstor.org/stable/169563 [Google Scholar]
[5].Yao J, Zhang X, Murray AT, Location optimization of urban fire stations: Access and service coverage, Computers, Environment and Urban Systems 73 (2019) 184–190. doi: 10.1016/j.compenvurbsys.2018.10.006. URL http://www.sciencedirect.com/science/article/pii/S0198971517305525 [DOI] [Google Scholar]
[6].Courant R, Robbins H, What Is Mathematics? An Elementary Approach to Ideas and Methods, 2nd Edition, Oxford University Press, 1996. [Google Scholar]
[7].Weber A, Friedrich C, Alfred Weber’s theory of the location of industries, Materials for the study of business, The University of Chicago Press, 1929.URL https://books.google.com/books?id=ofXsAAAAMAAJ [Google Scholar]
[8].Simpson T, The doctrine and application of fluxions, 1st Edition, London: : Printed for J. Nourse, 1776. [Google Scholar]
[9].Chen P-C, Hansen P, Jaumard B, Tuy H, Weber’s problem with attraction and repulsion*, Journal of Regional Science 32 (4) (1992) 467–486. arXiv: 10.1111/j.1467-9787.1992.tb00200.x, doi: 10.1111/j.1467-9787.1992.tb00200.x. URL 10.1111/j.1467-9787.1992.tb00200.x [DOI] [Google Scholar]
[10].Kulin HW, Kuenne RE, An efficient algorithm for the numerical solution of the generalized weber problem in spatial economics, Journal of Regional Science 4 (2) (1962) 21–33. arXiv: 10.1111/j.1467-9787.1962.tb00902.x, doi: 10.1111/j.1467-9787.1962.tb00902.x.,URL 10.1111/j.1467-9787.1962.tb00902.x [DOI] [Google Scholar]
[11].Weiszfeld E, Plastria F, On the point for which the sum of the distances to n given points is minimum, Annals of Operations Research 167 (1) (2009) 7–41. doi: 10.1007/s10479-008-0352-z., URL 10.1007/s10479-008-0352-z [DOI] [Google Scholar]
[12].Isard W, Location and Space-Economy, 1st Edition, The MIT Press, 1972. [Google Scholar]
[13].Cooper L, Location-allocation problems, Oper. Res 11 (3) (1963) 331–343. doi: 10.1287/opre.11.3.331., URL 10.1287/opre.11.3.331 [DOI] [Google Scholar]
[14].Goldman AJ, Optimal center location in simple networks, Transportation Science 5 (2) (1971) 212–221. arXiv: 10.1287/trsc.5.2.212, doi: 10.1287/trsc.5.2.212., URL 10.1287/trsc.5.2.212 [DOI] [Google Scholar]
[15].Kariv O, Hakimi S, An algorithmic approach to network location problems. ii: The p-medians, SIAM Journal on Applied Mathematics 37 (3) (1979) 539–560. [Google Scholar]
[16].Tamir A, An o(pn2) algorithm for the p-median and related problems on tree graphs, Operations Research Letters 19 (2) (1996) 59–64. doi: 10.1016/0167-6377(96)00021-1., URL http://www.sciencedirect.com/science/article/pii/0167637796000211 [DOI] [Google Scholar]
[17].Kariv O, Hakimi S, An algorithmic approach to network location problems. ii: The p-medians, SIAM Journal on Applied Mathematics 37 (3) (1979) 539–560. arXiv: 10.1137/0137041, doi: 10.1137/0137041. URL [DOI] [Google Scholar]
[18].Daskin MS, Network and discrete location: models, algorithms, and applications, 2nd Edition, New York: Wiley, 2013. [Google Scholar]
[19].Maranzana FE, On the location of supply points to minimize transport costs, OR 15, 15 (3) (1964) 261–270. doi: 10.2307/3007214. [DOI] [Google Scholar]
[20].Teitz MB, Bart P, Heuristic methods for estimating the generalized vertex median of a weighted graph, Operations Research 16 (5) (1968) 955–961. arXiv: 10.1287/opre.16.5.955, doi: 10.1287/opre.16.5.955., URL 10.1287/opre.16.5.955 [DOI] [Google Scholar]
[21].Goodchild MF, Noronha VT, Location-allocation for small computers, Iowa City, Iowa: Dept. of Geography, University of Iowa. (1983). [Google Scholar]
[22].Whitaker R, A fast algorithm for the greedy interchange for large-scale clustering and median location problems, INFOR: Information Systems and Operational Research 21 (2) (1983) 95–108. arXiv: 10.1080/03155986.1983.11731889, doi: 10.1080/03155986.1983.11731889., URL 10.1080/03155986.1983.11731889 [DOI] [Google Scholar]
[23].Hansen P, MladenoviÄĞ N, Variable neighborhood search for the p-median, Location Science 5 (4) (1997) 207–226. doi: 10.1016/S0966-8349(98)00030-8., URL http://www.sciencedirect.com/science/article/pii/S0966834998000308 [DOI] [Google Scholar]
[24].Densham PJ, Rushton G, A more efficient heuristic for solving largep-median problems, Papers in Regional Science 71 (3) (1992) 307–329. doi: 10.1007/BF01434270. URL 10.1007/BF01434270 [DOI] [Google Scholar]
[25].Kuehn AA, Hamburger MJ, A heuristic program for locating warehouses, Management Science 9 (4) (1963) 643–666., URL https://EconPapers.repec.org/RePEc:inm:ormnsc:v:9:y:1963:i:4:p:643-666 [Google Scholar]
[26].Alp O, Erkut E, Drezner Z, An efficient genetic algorithm for the p-median problem, Annals of Operations Research 122 (1) (2003) 21–42. doi: 10.1023/A:1026130003508., URL 10.1023/A:1026130003508 [DOI] [Google Scholar]
[27].Rolland E, Schilling DA, Current JR, An efficient tabu search procedure for the p-median problem, European Journal of Operational Research 96 (2) (1997) 329–342. doi: 10.1016/S0377-2217(96)00141-5., URL http://www.sciencedirect.com/science/article/pii/S0377221796001415 [DOI] [Google Scholar]
[28].BernÃąbe Loranca MB, GonzÃąlez VelÃązquez R, Estrada Analco M, The p-median problem: A tabu search approximation proposal applied to districts, Journal of Mathematics and System Science 5 (March 2015). doi: 10.17265/2159-5291/2015.03.002. [DOI] [Google Scholar]
[29].Rosing K, ReVelle C, Schilling D, A gamma heuristic for the p-median problem, European Journal of Operational Research 117 (3) (1999) 522–532. doi: 10.1016/S0377-2217(98)00268-9., URL http://www.sciencedirect.com/science/article/pii/S0377221798002689 [DOI] [Google Scholar]
[30].Chiyoshi F, Galvão RD, A statistical analysis of simulated annealing applied to the p-median problem, Annals of Operations Research 96 (1) (2000) 61–74. doi: 10.1023/A:1018982914742., URL 10.1023/A:1018982914742 [DOI] [Google Scholar]
[31].Murray AT, Church RL, Applying simulated annealing to location-planning models, Journal of Heuristics 2 (1) (1996) 31–53. doi: 10.1007/BF00226292. URL 10.1007/BF00226292 [DOI] [Google Scholar]
[32].Fisher ML, The lagrangian relaxation method for solving integer programming problems 50 (12) (2004) 1861–1871., URL www.jstor.org/stable/30046157 [Google Scholar]
[33].Simonis H, Chapter 25 - constraint applications in networks, in: Rossi F, van Beek P, Walsh T (Eds.), Handbook of Constraint Programming, Vol. 2 of Foundations of Artificial Intelligence, Elsevier, 2006, pp. 875–903. doi: 10.1016/S1574-6526(06)80029-5., URL http://www.sciencedirect.com/science/article/pii/S1574652606800295 [DOI] [Google Scholar]
[34].Lemaréchal Claude, Lagrangian Relaxation, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001, pp. 112–156. doi: 10.1007/3-540-45586-8_4., URL 10.1007/3-540-45586-8_4 [DOI] [Google Scholar]
[35].Senne E, Lorena L, Chapter 6 lagrangean/surrogate heuristics for p-median problems, in: Computing Tools for Modeling, Optimization and Simulation. Operations Research/Computer Science Interfaces Series, 12th Edition, Springer, Boston, MA, 2000. doi: 10.1007/978-1-4615-4567-5_6. [DOI] [Google Scholar]
[36].Senne E, Lorena L, Pereira M, A branch-and-price approach to p-median location problems, Computers & Operations Research 32 (6) (2005) 1655–1664. doi: 10.1016/j.cor.2003.11.024., URL http://www.sciencedirect.com/science/article/pii/S0305054803003630 [DOI] [Google Scholar]
[37].GarcÃŋa S, LabbÃl M, MarÃŋn A, Solving large p-median problems with a radius formulation, INFORMS Journal on Computing 23 (4) (2011) 546–556. arXiv: 10.1287/ijoc.1100.0418, doi: 10.1287/ijoc.1100.0418., URL 10.1287/ijoc.1100.0418 [DOI] [Google Scholar]
[38].Resende MG, Werneck RF, A hybrid heuristic for the p-median problem, Journal of Heuristics 10 (1) (2004) 59–88. doi: 10.1023/B:HEUR.0000019986.96257.50., URL 10.1023/B:HEUR.0000019986.96257.50 [DOI] [Google Scholar]
[39].Esri, Algorithms used by the arcgis network analyst extension (2020)., URL https://desktop.arcgis.com/en/arcmap/latest/extensions/network-analyst/algorithms-used-by-network-analyst.htm#ESRI_SECTION1_6FFC9C48F24746E182082F5DEBDBAA92
[40].Daskin MS, Maass KL, Chapter 2 the p-median problem, 2017. [Google Scholar]
[41].Cooper L, Heuristic methods for location-allocation problems, SIAM Review 6 (1) (1964) 37–53., URL http://www.jstor.org/stable/2027512 [Google Scholar]
[42].Laporte G, da Gama SNFS, Location Science, 1st Edition, Physica -Verlag, 2015. [Google Scholar]
[43].Murray AT, Xu J, Wang Z, Church RL, Commercial gis location analytics: capabilities and performance, International Journal of Geographical Information Science 33 (5) (2019) 1106–1130. arXiv: 10.1080/13658816.2019.1572898, doi: 10.1080/13658816.2019.1572898., URL 10.1080/13658816.2019.1572898 [DOI] [Google Scholar]
[44].Maliszewski PJ, Kuby MJ, Horner MW, A comparison of multi-objective spatial dispersion models for managing critical assets in urban areas, Computers, Environment and Urban Systems 36 (4) (2012) 331–341. doi: 10.1016/j.compenvurbsys.2011.12.006., URL http://www.sciencedirect.com/science/article/pii/S0198971511001293 [DOI] [Google Scholar]
[45].Mann HB, Whitney DR, On a test of whether one of two random variables is stochastically larger than the other, The Annals of Mathematical Statistics 18 (1) (1947) 50–60., URL www.jstor.org/stable/2236101 [Google Scholar]
[46].Texas Department of State Health Services, Public health regions (2019)., URL https://www.dshs.texas.gov/regions/
[47].Wikipedia, Coin-or (2020). [Google Scholar]
[48].Gurobi Optimization L, Gurobi optimizer reference manual (2020)., URL https://www.gurobi.com/ [Google Scholar]
[49].Mu W, Tong D, A spatial-knowledge-enhanced heuristic for solving the p-median problem, Transactions in GIS 22 (2) (2018) 477–493. arXiv: 10.1111/tgis.12322, doi: 10.1111/tgis.12322., URL 10.1111/tgis.12322 [DOI] [Google Scholar]
[50].Gwalani H, Alshammari S, Mikler A, Tiwari C, Submitted: A distributed algorithm for solving large-scale p-median problems using expectation maximization, Transactions on Knowledge and Data Engineering [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1710716-supplement-1.pdf^{(3.1MB, pdf)}

[R1] [1].Baumol WJ, Wolfe P, A warehouse-location problem, Oper. Res 6 (2) (1958) 252–263. doi:10.1287/opre.6.2.252.URL10.1287/opre.6.2.252.URL http://dx.doi.org/10.1287/opre.6.2.252http://dx.doi.org/10.1287/opre.6.2.252 [Google Scholar]

[R2] [2].Dejax PJ, A methodology for warehouse location and distribution systems planning, in: Bianco L, La Bella A (Eds.), Freight Transport Planning and Logistics, Springer Berlin Heidelberg, Berlin, Heidelberg, 1988, pp. 289–318. [Google Scholar]

[R3] [3].Nwogugu M, Site selection in the us retailing industry, Applied Mathematics and Computation 182 (2) (2006) 1725–1734. doi:10.1016/j.amc.2005.12.050.URL10.1016/j.amc.2005.12.050.URL http://www.sciencedirect.com/science/article/pii/S0096300306001251http://www.sciencedirect.com/science/article/pii/S0096300306001251 [Google Scholar]

[R4] [4].Plane DR, Hendrick TE, Mathematical programming and the location of fire companies for the denver fire department, Operations Research 25 (4) (1977) 563–578.URL http://www.jstor.org/stable/169563 [Google Scholar]

[R5] [5].Yao J, Zhang X, Murray AT, Location optimization of urban fire stations: Access and service coverage, Computers, Environment and Urban Systems 73 (2019) 184–190. doi: 10.1016/j.compenvurbsys.2018.10.006. URL http://www.sciencedirect.com/science/article/pii/S0198971517305525 [DOI] [Google Scholar]

[R6] [6].Courant R, Robbins H, What Is Mathematics? An Elementary Approach to Ideas and Methods, 2nd Edition, Oxford University Press, 1996. [Google Scholar]

[R7] [7].Weber A, Friedrich C, Alfred Weber’s theory of the location of industries, Materials for the study of business, The University of Chicago Press, 1929.URL https://books.google.com/books?id=ofXsAAAAMAAJ [Google Scholar]

[R8] [8].Simpson T, The doctrine and application of fluxions, 1st Edition, London: : Printed for J. Nourse, 1776. [Google Scholar]

[R9] [9].Chen P-C, Hansen P, Jaumard B, Tuy H, Weber’s problem with attraction and repulsion*, Journal of Regional Science 32 (4) (1992) 467–486. arXiv: 10.1111/j.1467-9787.1992.tb00200.x, doi: 10.1111/j.1467-9787.1992.tb00200.x. URL 10.1111/j.1467-9787.1992.tb00200.x [DOI] [Google Scholar]

[R10] [10].Kulin HW, Kuenne RE, An efficient algorithm for the numerical solution of the generalized weber problem in spatial economics, Journal of Regional Science 4 (2) (1962) 21–33. arXiv: 10.1111/j.1467-9787.1962.tb00902.x, doi: 10.1111/j.1467-9787.1962.tb00902.x.,URL 10.1111/j.1467-9787.1962.tb00902.x [DOI] [Google Scholar]

[R11] [11].Weiszfeld E, Plastria F, On the point for which the sum of the distances to n given points is minimum, Annals of Operations Research 167 (1) (2009) 7–41. doi: 10.1007/s10479-008-0352-z., URL 10.1007/s10479-008-0352-z [DOI] [Google Scholar]

[R12] [12].Isard W, Location and Space-Economy, 1st Edition, The MIT Press, 1972. [Google Scholar]

[R13] [13].Cooper L, Location-allocation problems, Oper. Res 11 (3) (1963) 331–343. doi: 10.1287/opre.11.3.331., URL 10.1287/opre.11.3.331 [DOI] [Google Scholar]

[R14] [14].Goldman AJ, Optimal center location in simple networks, Transportation Science 5 (2) (1971) 212–221. arXiv: 10.1287/trsc.5.2.212, doi: 10.1287/trsc.5.2.212., URL 10.1287/trsc.5.2.212 [DOI] [Google Scholar]

[R15] [15].Kariv O, Hakimi S, An algorithmic approach to network location problems. ii: The p-medians, SIAM Journal on Applied Mathematics 37 (3) (1979) 539–560. [Google Scholar]

[R16] [16].Tamir A, An o(pn2) algorithm for the p-median and related problems on tree graphs, Operations Research Letters 19 (2) (1996) 59–64. doi: 10.1016/0167-6377(96)00021-1., URL http://www.sciencedirect.com/science/article/pii/0167637796000211 [DOI] [Google Scholar]

[R17] [17].Kariv O, Hakimi S, An algorithmic approach to network location problems. ii: The p-medians, SIAM Journal on Applied Mathematics 37 (3) (1979) 539–560. arXiv: 10.1137/0137041, doi: 10.1137/0137041. URL [DOI] [Google Scholar]

[R18] [18].Daskin MS, Network and discrete location: models, algorithms, and applications, 2nd Edition, New York: Wiley, 2013. [Google Scholar]

[R19] [19].Maranzana FE, On the location of supply points to minimize transport costs, OR 15, 15 (3) (1964) 261–270. doi: 10.2307/3007214. [DOI] [Google Scholar]

[R20] [20].Teitz MB, Bart P, Heuristic methods for estimating the generalized vertex median of a weighted graph, Operations Research 16 (5) (1968) 955–961. arXiv: 10.1287/opre.16.5.955, doi: 10.1287/opre.16.5.955., URL 10.1287/opre.16.5.955 [DOI] [Google Scholar]

[R21] [21].Goodchild MF, Noronha VT, Location-allocation for small computers, Iowa City, Iowa: Dept. of Geography, University of Iowa. (1983). [Google Scholar]

[R22] [22].Whitaker R, A fast algorithm for the greedy interchange for large-scale clustering and median location problems, INFOR: Information Systems and Operational Research 21 (2) (1983) 95–108. arXiv: 10.1080/03155986.1983.11731889, doi: 10.1080/03155986.1983.11731889., URL 10.1080/03155986.1983.11731889 [DOI] [Google Scholar]

[R23] [23].Hansen P, MladenoviÄĞ N, Variable neighborhood search for the p-median, Location Science 5 (4) (1997) 207–226. doi: 10.1016/S0966-8349(98)00030-8., URL http://www.sciencedirect.com/science/article/pii/S0966834998000308 [DOI] [Google Scholar]

[R24] [24].Densham PJ, Rushton G, A more efficient heuristic for solving largep-median problems, Papers in Regional Science 71 (3) (1992) 307–329. doi: 10.1007/BF01434270. URL 10.1007/BF01434270 [DOI] [Google Scholar]

[R25] [25].Kuehn AA, Hamburger MJ, A heuristic program for locating warehouses, Management Science 9 (4) (1963) 643–666., URL https://EconPapers.repec.org/RePEc:inm:ormnsc:v:9:y:1963:i:4:p:643-666 [Google Scholar]

[R26] [26].Alp O, Erkut E, Drezner Z, An efficient genetic algorithm for the p-median problem, Annals of Operations Research 122 (1) (2003) 21–42. doi: 10.1023/A:1026130003508., URL 10.1023/A:1026130003508 [DOI] [Google Scholar]

[R27] [27].Rolland E, Schilling DA, Current JR, An efficient tabu search procedure for the p-median problem, European Journal of Operational Research 96 (2) (1997) 329–342. doi: 10.1016/S0377-2217(96)00141-5., URL http://www.sciencedirect.com/science/article/pii/S0377221796001415 [DOI] [Google Scholar]

[R28] [28].BernÃąbe Loranca MB, GonzÃąlez VelÃązquez R, Estrada Analco M, The p-median problem: A tabu search approximation proposal applied to districts, Journal of Mathematics and System Science 5 (March 2015). doi: 10.17265/2159-5291/2015.03.002. [DOI] [Google Scholar]

[R29] [29].Rosing K, ReVelle C, Schilling D, A gamma heuristic for the p-median problem, European Journal of Operational Research 117 (3) (1999) 522–532. doi: 10.1016/S0377-2217(98)00268-9., URL http://www.sciencedirect.com/science/article/pii/S0377221798002689 [DOI] [Google Scholar]

[R30] [30].Chiyoshi F, Galvão RD, A statistical analysis of simulated annealing applied to the p-median problem, Annals of Operations Research 96 (1) (2000) 61–74. doi: 10.1023/A:1018982914742., URL 10.1023/A:1018982914742 [DOI] [Google Scholar]

[R31] [31].Murray AT, Church RL, Applying simulated annealing to location-planning models, Journal of Heuristics 2 (1) (1996) 31–53. doi: 10.1007/BF00226292. URL 10.1007/BF00226292 [DOI] [Google Scholar]

[R32] [32].Fisher ML, The lagrangian relaxation method for solving integer programming problems 50 (12) (2004) 1861–1871., URL www.jstor.org/stable/30046157 [Google Scholar]

[R33] [33].Simonis H, Chapter 25 - constraint applications in networks, in: Rossi F, van Beek P, Walsh T (Eds.), Handbook of Constraint Programming, Vol. 2 of Foundations of Artificial Intelligence, Elsevier, 2006, pp. 875–903. doi: 10.1016/S1574-6526(06)80029-5., URL http://www.sciencedirect.com/science/article/pii/S1574652606800295 [DOI] [Google Scholar]

[R34] [34].Lemaréchal Claude, Lagrangian Relaxation, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001, pp. 112–156. doi: 10.1007/3-540-45586-8_4., URL 10.1007/3-540-45586-8_4 [DOI] [Google Scholar]

[R35] [35].Senne E, Lorena L, Chapter 6 lagrangean/surrogate heuristics for p-median problems, in: Computing Tools for Modeling, Optimization and Simulation. Operations Research/Computer Science Interfaces Series, 12th Edition, Springer, Boston, MA, 2000. doi: 10.1007/978-1-4615-4567-5_6. [DOI] [Google Scholar]

[R36] [36].Senne E, Lorena L, Pereira M, A branch-and-price approach to p-median location problems, Computers & Operations Research 32 (6) (2005) 1655–1664. doi: 10.1016/j.cor.2003.11.024., URL http://www.sciencedirect.com/science/article/pii/S0305054803003630 [DOI] [Google Scholar]

[R37] [37].GarcÃŋa S, LabbÃl M, MarÃŋn A, Solving large p-median problems with a radius formulation, INFORMS Journal on Computing 23 (4) (2011) 546–556. arXiv: 10.1287/ijoc.1100.0418, doi: 10.1287/ijoc.1100.0418., URL 10.1287/ijoc.1100.0418 [DOI] [Google Scholar]

[R38] [38].Resende MG, Werneck RF, A hybrid heuristic for the p-median problem, Journal of Heuristics 10 (1) (2004) 59–88. doi: 10.1023/B:HEUR.0000019986.96257.50., URL 10.1023/B:HEUR.0000019986.96257.50 [DOI] [Google Scholar]

[R39] [39].Esri, Algorithms used by the arcgis network analyst extension (2020)., URL https://desktop.arcgis.com/en/arcmap/latest/extensions/network-analyst/algorithms-used-by-network-analyst.htm#ESRI_SECTION1_6FFC9C48F24746E182082F5DEBDBAA92

[R40] [40].Daskin MS, Maass KL, Chapter 2 the p-median problem, 2017. [Google Scholar]

[R41] [41].Cooper L, Heuristic methods for location-allocation problems, SIAM Review 6 (1) (1964) 37–53., URL http://www.jstor.org/stable/2027512 [Google Scholar]

[R42] [42].Laporte G, da Gama SNFS, Location Science, 1st Edition, Physica -Verlag, 2015. [Google Scholar]

[R43] [43].Murray AT, Xu J, Wang Z, Church RL, Commercial gis location analytics: capabilities and performance, International Journal of Geographical Information Science 33 (5) (2019) 1106–1130. arXiv: 10.1080/13658816.2019.1572898, doi: 10.1080/13658816.2019.1572898., URL 10.1080/13658816.2019.1572898 [DOI] [Google Scholar]

[R44] [44].Maliszewski PJ, Kuby MJ, Horner MW, A comparison of multi-objective spatial dispersion models for managing critical assets in urban areas, Computers, Environment and Urban Systems 36 (4) (2012) 331–341. doi: 10.1016/j.compenvurbsys.2011.12.006., URL http://www.sciencedirect.com/science/article/pii/S0198971511001293 [DOI] [Google Scholar]

[R45] [45].Mann HB, Whitney DR, On a test of whether one of two random variables is stochastically larger than the other, The Annals of Mathematical Statistics 18 (1) (1947) 50–60., URL www.jstor.org/stable/2236101 [Google Scholar]

[R46] [46].Texas Department of State Health Services, Public health regions (2019)., URL https://www.dshs.texas.gov/regions/

[R47] [47].Wikipedia, Coin-or (2020). [Google Scholar]

[R48] [48].Gurobi Optimization L, Gurobi optimizer reference manual (2020)., URL https://www.gurobi.com/ [Google Scholar]

[R49] [49].Mu W, Tong D, A spatial-knowledge-enhanced heuristic for solving the p-median problem, Transactions in GIS 22 (2) (2018) 477–493. arXiv: 10.1111/tgis.12322, doi: 10.1111/tgis.12322., URL 10.1111/tgis.12322 [DOI] [Google Scholar]

[R50] [50].Gwalani H, Alshammari S, Mikler A, Tiwari C, Submitted: A distributed algorithm for solving large-scale p-median problems using expectation maximization, Transactions on Knowledge and Data Engineering [Google Scholar]

PERMALINK

Evaluation of Heuristics for the P-Median Problem: Scale and Spatial Demand Distribution

Harsha Gwalani

Chetan Tiwari

Armin R Mikler

Roles

Abstract

1. Introduction

Table 1.

1.1. Problem Formulation

2. Methodology

2.1. Alternate Selection and Allocation (Maranzana’s Algorithm)[19]

2.2. Exchange Algorithm[21][22]

2.3. Global/Regional Interchange Algorithm[24]

2.4. Greedy Addition/ Myopic Algorithm[25]

3. Experiments

3.1. Spatial Distributions

Figure 1:

3.2. Scale

Table 2.

3.3. Synthetic Data Creation

3.3.1. Distribution of Population

Figure 2:

Figure 3:

3.3.2. Creation of Demand Points

4. Results

4.1. Cost Function

Figure 4:

Figure 5:

Figure 6:

Effect of scale and spatial distribution:

Figure 8:

4.2. Time and Iterations

Figure 9:

Figure 10:

Effect of scale and spatial distribution:

Figure 11:

Figure 12:

4.3. Stability

Table 3.

Effect of scale and spatial distribution:

Figure 13:

4.4. Case Study: Resource Distribution during Bio-Emergencies

Figure 14:

Table 4.

Table 5.

5. Conclusion

Supplementary Material

Figure 7:

HIGHLIGHTS.

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases