Abstract
Geographical clusters of undervaccinated populations have emerged in various parts of the United States in recent years. Public health response involves surveillance and field work, which is very resource intensive. Given that public health resources are often limited, identifying and rank-ordering critical clusters can help prioritize and allocate scarce resources for surveillance and quick intervention. We quantify the criticality of a cluster as the additional number of infections caused if the cluster is underimmunized. We focus on finding clusters that maximize this measure and develop efficient approximation algorithms for finding critical clusters by exploiting structural properties of the problem. Our methods involve solving a more general problem of maximizing a submodular function on a graph with connectivity constraints. We apply our methods to the state of Minnesota, where we find clusters with significantly higher criticality than those obtained by heuristics used in public health.
1. INTRODUCTION
Many highly contagious childhood diseases, such as measles, can be prevented by vaccination. Thus, it is worrisome that large outbreaks of such diseases have occurred in recent years. One of the reasons for the emergence of underimmunized geographical clusters, such as in California [8] and Minnesota [4], is misperceptions about the side effects of vaccines [2]. The typical response by public health agencies is to monitor clusters where immunization rates are falling, run active information campaigns, and engage community leaders. However, implementing public health interventions in all these clusters would be costly and time-consuming, which motivates the following question: when do undervaccinated clusters pose significant risk (i.e., become critical) for the broader community? We develop a method to address this important public health policy question. Our contributions are summarized below.
Formalizing criticality. We formalize the notion of criticality of a subset S ⊆ V in a social contact network G = (V, E), as the expected number of additional infections that would occur if the immunization rate within S is “low”. Extending this notion, we introduce the MaxCrit problem: find the spatial cluster with maximum criticality in a population.
Rigorous algorithms for MaxCrit. We show that MaxCrit is NP-hard and design algorithm ApproxMaxCrit, which has a worst case approximation guarantee of Ω(1/k(d−1)/(2d−1)), relative to the optimum, for clusters of size k. Here, d is the doubling dimension of the graph.
Improved algorithms for submodular function maximization with connectivity. We show that the criticality function is submodular, which implies that MaxCrit is a special case of maximizing a non-negative monotone submodular function over connected subgraphs of size k. Our algorithm is the first improvement over the best known bound of [7] for submodular function maximization with connectivity constraints.
Social impact. Our method for finding critical sets, applied to detailed population and contact network models, provides an operational tool for public health agencies to prioritize limited surveillance and outreach resources towards the most critical clusters.
2. PRELIMINARIES
Let V denote a population, and let G = (V, E) be a contact graph on which a disease can spread. A person or node v ∈ V can propagate the disease to its neighbors. Each person v is associated with a geographical location—i.e., their place of residence—denoted by loc(v). Let denote the geographical area where the nodes V are located—e.g., Minnesota—and let be a decomposition of into census block groups. For a block group , we use V(ri) to denote the set of nodes associated with location ri; that is, those with loc(v) ∈ ri. Analogously, for a set of block groups or region , let be the set of nodes located within R. We consider a graph on the set of block groups, where two block groups are connected if they are geographically contiguous, i.e., adjacent on a map. We use to denote all the subsets that are spatially connected.
Disease model.
We use an SEIR model for diseases like measles [1]. Let ɣ denote the average region-wide vaccination rate—around 0.97 in Minnesota. Let x be a vaccination vector: xi ∈ [0, 1] denotes the probability that node i is vaccinated (so xi = ɣ, by default). Let SrcA denote the source of the infection: this could be one or a small number of nodes from a region , which initially get infected. We use #inf(x, SrcA) to denote the expected number of infections given an intervention x and initial conditions SrcA.
2.1. Criticality
For a vaccination vector x, let xS denote the corresponding intervention where a subset S ⊂ V of nodes is undervaccinated. That is, for i ∉ S and for i ∈ S, where ɣ′ ≪ ɣ.
We define the criticality of a region R the expected number of additional infections that occur if nodes in R are undervaccinated, with respect to source SrcA:
Finding critical clusters.
In practice, public health interventions involve intensive field work, which is most effective within small, localized geographical regions. Therefore, we focus on finding regions that have high criticality and small size.
Problem 1 (MaxCrit(G, , k)). Given an instance (G, , k), find a connected region of size at most k that maximizes criticality over all choices of source:
In words, the MaxCrit problem involves maximizing over all possible choices of the sources SrcR′ in the cluster R′.
3. PROPOSED METHODS
Our strategy involves showing that the crit function is submodular. Intuitively, crit satisfies a diminishing returns property because an unvaccinated block group r causes more additional infections in the context of a smaller region. Thus, the MaxCrit problem reduces to maximizing a submodular function over connected subgraphs R of with |R| ≤ k.
We present our algorithm for MaxCrit in Algorithm 1. For submodular maximization with connectivity constraints, Kuo et al. [7] give an approximation guarantee. We derive an improved algorithm by exploiting the geometric structure of the graph. As in [7], the main idea is to first solve the relaxed problem where we ignore the connectivity constraint for each location r and surrounding locations at distance ℓ—i.e., the ball B(r, ℓ) in line 11. Then, we make the solution connected via a Steiner tree (line 12). Our algorithm has an approximation guarantee of Ω(1/k(d−1)/(2d−1)) for a graph with doubling dimension d, which improves the bound of [7], since d > 0 for any graph. We also solve a Budgeted Steiner Tree (BST) [6] as a subroutine in our algorithm (line 5). BST is a relaxed version of the problem where the function is modular (instead of submodular) on the criticality of the nodes. This step is not needed for the approximation guarantee, but we find that it improves optimization power in practice.
4. APPLICATION TO MINNESOTA
Experimental setup.
We use a realistic contact network model of Minnesota [5] with 5, 048, 920 individuals in total, aggregated into 4,082 census block groups from the 2010 U.S. census. We consider an SEIR stochastic model for measles, as described in Section 2. The criticality of a region R of block groups is assessed by leaving every individual inside R unvaccinated; everybody else in the population is vaccinated with probability 0.97, which is the statewide vaccination rate. We compare our algorithms with two heuristics used in public health. The Population heuristic finds a cluster of size k with the largest total population. The Vulnerability heuristic prioritizes individuals who are most likely to get infected when no one is vaccinated. We also compare to a Random baseline, which finds a connected cluster of size k by doing a random walk on the graph .

Optimization power.
In Figure 1, we show the criticality obtained by ApproxMaxCrit and the baselines as a function of cluster size k. ApproxMaxCrit exhibits notably better performance than the heuristics. Random performs poorly and results in almost no additional infections. Surprisingly, Vulnerability does not perform much better than Random. Overall, the Population heuristic has better performance among the baselines, but non-monotonic growth with cluster size.
Figure 1:

Comparison of algorithms for the MaxCrit problem as a function of the solution size k
Demographics of critical clusters.
We compare the distribution of age and income in the cluster discovered by ApproxMaxCrit to that of the entire state (results not shown). The most critical cluster has significantly more households of low income (below $25,000) compared to the entire state—19.6% to 34.9%. Similarly, minors are over-represented. 26.6% of the population are between 5–18 years old compared to the average of 18.7%.
When we focus on the Minneapolis area instead of the entire state, the most critical cluster covers Brooklyn Park, where measles outbreaks occurred in 2017 and 2019.
Acknowledgements:
Work is supported by NIH grant 1R01GM109718, NSF BIG DATA IIS-1633028, NSF DIBBS ACI-1443054. Prepared by LLNL under Contract DE-AC52-07NA27344. LLNL-ABS-805897.
Contributor Information
Jose Cadena, Lawrence Livermore National Laboratory, Livermore, CA, USA.
Achla Marathe, Biocomplexity Institute, Dept. of Public Health Sc., Univ. of Virginia, Charlottesville, VA, USA.
Anil Vullikanti, Biocomplexity Institute, Dept. of Computer Science, Univ. of Virginia, Charlottesville, VA, USA.
REFERENCES
- [1].Anderson RM and May RM. 1991. Infectious Diseases of Humans. Oxford University Press, Oxford. [Google Scholar]
- [2].Atwell Jessica E., Josh Van Otterloo Jennifer Zipprich, Winter Kathleen, Harriman Kathleen, Salmon Daniel A., Halsey Neal A., and Omer Saad B.. 2013. Nonmedical Vaccine Exemptions and Pertussis in California, 2010. Pediatrics (2013). [DOI] [PubMed] [Google Scholar]
- [3].Cadena Jose, Chen Feng, and Vullikanti Anil. 2017. Near-Optimal and Practical Algorithms for Graph Scan Statistics. In SIAM Data Mining (SDM). [Google Scholar]
- [4].Cadena Jose, Falcone David, Marathe Achla, and Vullikanti Anil. 2019. Discovery of under immunized spatial clusters using network scan statistics. BMC Medical Informatics and Decision Making 19, 1 (2019), 28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Eubank S, Guclu H, Anil Kumar VS, Marathe M, Srinivasan A, Toroczkai Z, and Wang N. 2004. Modelling disease outbreaks in realistic urban social networks. Nature 429 (2004), 180–184. Issue 6988. [DOI] [PubMed] [Google Scholar]
- [6].Johnson D, Minkoff M, and Phillips S. 2000. The Prize Collecting Steiner Tree Problem: Theory and Practice. In ACM SODA. [Google Scholar]
- [7].Kuo Tung-Wei, Lin Kate Ching-Ju, and Tsai Ming-Jer. 2015. Maximizing Submodular Set Function With Connectivity Constraint: Theory and Application to Networks. IEEE/ACM Transactions on Networking 23, 2 (2015), 533–546. [Google Scholar]
- [8].Lieu Tracy A, Ray G Thomas, Klein Nicola P, Chung Cindy, and Kulldorff Martin. 2015. Geographic clusters in underimmunization and vaccine refusal. Pediatrics 135, 2 (2015), 280–289. [DOI] [PubMed] [Google Scholar]
- [9].Nemhauser GL, Wolsey LA, and Fisher ML. 1978. An analysis of approximations for maximizing submodular set functions. Mathematical Programming 14, 1 (1978), 265–294. [Google Scholar]
