Bayesian optimization of distributed neurodynamical controller models for spatial navigation

Armin Hadzic; Grace M Hwang; Kechen Zhang; Kevin M Schultz; Joseph D Monaco

doi:10.1016/j.array.2022.100218

. Author manuscript; available in PMC: 2022 Oct 6.

Published in final edited form as: Array (N Y). 2022 Jul 15;15:100218. doi: 10.1016/j.array.2022.100218

Bayesian optimization of distributed neurodynamical controller models for spatial navigation

Armin Hadzic ^a,^*, Grace M Hwang ^a,^b, Kechen Zhang ^c, Kevin M Schultz ^a, Joseph D Monaco ^c

PMCID: PMC9536152 NIHMSID: NIHMS1834427 PMID: 36213421

Abstract

Dynamical systems models for controlling multi-agent swarms have demonstrated advances toward resilient, decentralized navigation algorithms. We previously introduced the NeuroSwarms controller, in which agent-based interactions were modeled by analogy to neuronal network interactions, including attractor dynamics and phase synchrony, that have been theorized to operate within hippocampal place-cell circuits in navigating rodents. This complexity precludes linear analyses of stability, controllability, and performance typically used to study conventional swarm models. Further, tuning dynamical controllers by manual or grid-based search is often inadequate due to the complexity of objectives, dimensionality of model parameters, and computational costs of simulation-based sampling. Here, we present a framework for tuning dynamical controller models of autonomous multi-agent systems with Bayesian optimization. Our approach utilizes a task-dependent objective function to train Gaussian process surrogate models to achieve adaptive and efficient exploration of a dynamical controller model’s parameter space. We demonstrate this approach by studying an objective function selecting for NeuroSwarms behaviors that cooperatively localize and capture spatially distributed rewards under time pressure. We generalized task performance across environments by combining scores for simulations in multiple mazes with distinct geometries. To validate search performance, we compared high-dimensional clustering for high- vs. low-likelihood parameter points by visualizing sample trajectories in 2-dimensional embeddings. Our findings show that adaptive, sample-efficient evaluation of the self-organizing behavioral capacities of complex systems, including dynamical swarm controllers, can accelerate the translation of neuroscientific theory to applied domains.

Keywords: Bayesian optimization, Multi-agent control, Swarming, Dynamical systems models, Spatial navigation, UMAP

1. Introduction

Collective biological behaviors of animal groups, including swarming, flocking, and schooling behaviors [1–6] have long inspired robotics and computer science research into problems of decentralized control and coordination for autonomous groups of artificial agents [7–12]. In particular, advancing the autonomous spatial capabilities of multi-agent swarm control has been a key objective of simulation studies and analyses of artificial swarms based on dynamical systems models [13]. Complementarily, the impressive recent progress of artificial intelligence based on deep learning [14] has demonstrated the importance of adopting key biological inspirations from neuroscience and the brain. However, it has been unclear how to integrate complex temporal features of brain dynamics thought to support crucial mechanisms of neural computation [15]. Thus, addressing critical questions in autonomous robotics and artificial intelligence may depend on efficient exploration and optimization of dynamical systems models with complex interactions among many units. In both domains, major gaps in state-of-the-art capabilities are highlighted by tasks involving autonomous spatial navigation and foraging [16–19] in complex, novel, or changing environments.

Bayesian optimization provides a probabilistic framework for adaptive, sample-efficient optimization of ‘black box’ models with moderate dimensionality (up to ~20 parameters) and expensive sample evaluations. In this framework, a task-dependent objective function signifies the output performance of the complex underlying model, and the optimizer traces parameter-space trajectories of candidate points from acquisition functions operating on a simpler surrogate model. The typical surrogate model is a Gaussian process that populates the parameter space of interest with multivariate normal distributions and which serves as a prior distribution for candidate-point updates [20, 21]. Bayesian optimization with Gaussian process surrogate models has enabled applications including the hyperparameter tuning and optimization of evolutionary algorithms, multi-modal functions, robotic controllers, and other complex systems [22–27].

The collective behavioral states of some swarming models are tractable to linear analysis of stability, density, and clustering properties [28–32]. However, for dynamical systems that preclude such analysis due to nonlinearity, nonstationarity, stochasticity, or other complications, the computational budget for parameter exploration or optimization with simulation-based samples is a limiting factor for translation to engineered designs. Indeed, standard methods based on gradient descent have two main drawbacks in this context: they can discover local optima, but resist exploration of system behaviors for other purposes; and their basic operation is massively sample-inefficient, which can be prohibitive for expensive simulation-based sample evaluations. Moreover, emergent collective behaviors like swarming outstrip conventional agent-based learning methods based on the restrictive action and policy spaces of reinforcement learning, particularly for uncertain, changing, or open-ended tasks.

We previously introduced the NeuroSwarms framework for modeling emergent high-level navigation and foraging in a brain-inspired multi-agent metacontroller [33–35]. NeuroSwarms addressed decentralized, distributed control by analogy to neural circuit dynamics, including oscillations [36–39] and attractors [40–42], and associative synaptic plasticity [43] related to rodent spatial cognition; the resulting collective behaviors of NeuroSwarms models included swarming, patroling, and goal-finding in simulated maze environments with complex, irregular, or fragmented geometry [34]. These behaviors enabled NeuroSwarms to complete cooperative multiple reward-capture tasks without pretraining across distinct environments [34]. However, the nonlinearities inherent in NeuroSwarms’ oscillatory phase-coupled self-organization precluded analytic approaches to global identification, exploration, or optimization of system behaviors. Thus, this class of dynamical systems model can provide insights into key aspects of brain structure and function that may inspire theoretical advances as well as new directions for systems engineering designs. This insight depends crucially on devising a task-dependent objective function that can guide the efficient discovery of system behaviors and optimal performance. In this paper, we demonstrate that Bayesian optimization can utilize such an objective function to efficiently and usefully find paths through otherwise prohibitive model spaces. In particular, we show that a neurodynamical controller model with emergent properties can be characterized and tuned using Bayesian optimization with Gaussian process surrogate models.

2. Models and methods

2.1. NeuroSwarms model

Monaco et al. (2020) [34] introduced the NeuroSwarms framework and described a model implementation with 300 agents; baseline wall-avoiding, momentum-carrying motion-vector updates; maze environments whose geometry occluded agents’ line-of-sight; interagent communication between mutually visible agents; cosine-coupling of internal phase variables driving interagent attraction and repulsion; and 9 key dynamical parameters (Table 1) that had required intensive manual fine-tuning to balance swarming and reward capture.

Table 1.

Tunable parameters that governed the spatiotemporal dynamics of the example NeuroSwarms model implementation [34]. ‘Range’ indicates the limits of the parameter subspace made available for Bayesian optimization. All other NeuroSwarms parameter values and constants were fixed at the defaults in Table 1 of Monaco et al. (2020) [34].

Name	Range	Description

σ	[10⁻³, 4]	Normalized interagent spatial scale
κ	[10⁻³, 4]	Normalized reward-approach spatial scale
η_s	[10⁻³, 4]	Recurrent interagent learning rate
η_r	[10⁻³, 4]	Feedforward reward-approach learning rate
ω ₀	[0, 1]	Baseline agent oscillation frequency
ω_I	[0, 1]	Max. activation-based frequency increase
τ_q	[0, 1]	Recurrent interagent time-constant
τ_r	[0, 1]	Feedforward reward time-constant
τ_c	[0, 1]	Sensory input time-constant

Open in a new tab

2.2. Bayesian optimization

Bayesian optimization constructs and performs sequential optimization on a surrogate model that represents the objective performance of a more complex model [44–46]. Learning surrogate models can be beneficial if directly optimizing a complex model is not computationally tractable given resource constraints. These surrogate models can then be deployed to predict the performance of the underlying model at untested parameter points without requiring a full model simulation of those parameter values (Fig. 1).

Fig. 1. — Computation flow for optimization and simulation-based sampling. A, Step 1: The posterior distribution is computed from the Gaussian process surrogate model (GP Model) based on the training data $𝒟$ . Step 2: The acquisition function’s Quasi Monte Carlo sampling process uses the posterior distribution to select new candidate parameters $\hat{X}$ (Step 3) based on the acquisition function’s estimated objective function value $\hat{Y}$ (Step 4). Step 5: The NeuroSwarms model [33,34] is simulated with candidate parameter points $\hat{X}$ to generate the observed objective value Y (see B). Step 6: The initial Gaussian process model’s marginal log-likelihood (MLL) is then calculated and used to optimize the Gaussian process using the L-BFGS-B algorithm [47]. Step 7: The resulting $𝒟$ (from Step 5) and MLL (from Step 6) update the Gaussian process model for the next iteration of the outer loop. B, Flow diagram of simulation-based candidate-point evaluation. For each sample (see Step 5 in A), the optimizer executes play-throughs in both the Hairpin (top) and Tunnel (bottom) maze environments. The sample’s objective value Y is computed as the average of the respective loss values L_H and L_T (Eq. (3)).

We implemented Bayesian optimization with surrogate models defined as Gaussian processes [20,48,49]. Gaussian processes are parametric models that iteratively learn a probabilistic mapping $f : X \mapsto ℝ$ such that the density estimate p(y_i|x_i) = f(x_i, y_i), where $X \subseteq ℝ^{p}$ is the bounded parameter subspace being optimized, $x_{i} \in X$ is a parameter point, and $y_{i} \in ℝ$ is an objective function output value [21,50,51]; e.g., p = 9 NeuroSwarms parameters in this paper. Thus, the underlying ‘black box’ objective function f_true is assumed to be distributed according to a Gaussian process,

f_{true} ~ 𝒢 𝒫_{μ, k} (X),

where μ(·) and κ(·) are mean and covariance kernels applied to an input parameter set, $X \subset X$ . The posterior distribution of a q-sized batch of candidate points $\hat{X} = {{\hat{x}}_{1}, \dots, {\hat{x}}_{q}}$ conditioned on the observed training data $𝒟 = {(x_{i}, y_{i})}_{i = 1}^{n}$ takes the form of a p-dimensional multivariate normal distribution, i.e., $P (𝒢 𝒫 (X) ∣ 𝒟) ~ 𝒩^{p} (μ (X), k (X))$ .

2.3. Acquisition functions

Bayesian optimization relies on acquisition functions to provide the candidate parameter points that navigate the underlying model space. Acquisition functions define a strategy to manage the trade-off between exploring the parameter space and exploiting regions that yielded improvement for previous samples [52]. An acquisition function can be evaluated on the Gaussian process posterior $P (𝒢 𝒫 (X) ∣ 𝒟)$ by averaging a set of Monte Carlo (MC) samples, e.g.,

{\hat{α}}_{n} (X; 𝒟) = \frac{1}{n} \sum_{i = 1}^{n} a (ε_{𝒟}^{i} (X)),

(1)

where n is the sample count and a(·) is the net utility function providing objective function output. Thus, ${\hat{α}}_{n}$ is an expectation of posterior samples $ε_{𝒟} ~ P (𝒢 𝒫 (X) ∣ 𝒟)$ . We study a pair of MC-based acquisition functions: 𝑞-Expected Improvement (qEI) [53] and Noisy 𝑞-Expected Improvement (qNoisyEI) [54]. We compare qEI and qNoisyEI to random sampling of candidate parameters. First, similar to ${\hat{α}}_{n}$ (Eq. (1)), qEI calculates an expectation over posterior samples,

qEI (X) \approx \frac{1}{n} \sum_{i = 1}^{n} \max_{j = 1}^{q} {[ε_{j}^{i} - Y^{*}]}_{+},

where [·]₊ indicates linear rectification and 𝑌* is the best observed objective function value. Thus, qEI estimates a noise-free expected improvement of the posterior with respect to the best value. Second, qNoisyEI approximates improvement relative to the expected best objective value conditioned on the observed MC sampling history ε_obs within each batch [55]; simplistically, the constrained batch-sampling performed by qNoisyEI [54,56] approximates

qNoisyEI (X; 𝒟) \approx \frac{1}{n} \sum_{i = 1}^{n} \max_{j = 1}^{q} {[ε_{j}^{i} - \max ε_{obs}]}_{+},

but more detailed treatments of this complex optimization problem provide critical analyses and caveats (cf. [54–56]).

Throughout our study, Bayesian optimization with any of the three acquisition functions employed 512 MC samples, 30 training epochs (with a batch size of 3), and 8 random training samples to initialize the Gaussian process surrogate model.

2.4. Objective function

We constructed an objective function to evaluate the performance of the example NeuroSwarms model [34] in a time-pressured cooperative foraging task. The objective function quantifies how quickly the swarm of agents collectively capture several spatially distributed rewards in a given maze. Let n_cap(t) be the cumulative number of cooperatively captured rewards by time t. A reward is captured if, at any timestep, at least n_s∕n_r agents were simultaneously colocated within a defined radius from the reward, where n_s = 300 agents and n_r = 3 and 5 rewards in the Tunnel and Hairpin mazes, respectively. For a given simulated play-through, this objective function can be expressed as a loss which is updated at every timestep until all rewards are captured,

L = - t / (n_{t} n_{cap} (t) + 1),

(2)

where n_t is the total number of time steps. The agent group’s behavior is time-pressured by t growing continuously until all rewards are captured. If the swarm is not able to capture all the rewards in the environment, t will be set to the maximum number of timesteps allowed for the simulation n_t and the loss will reflect the number of missed rewards. Loss values range from [−1, 0], with better task performance closer to zero.

To account for the generalizability of spatial task performance across distinct environmental geometries, each simulation-based sample constitutes play-throughs of both the Hairpin and Tunnel mazes, respectively providing loss values L_H and L_T as calculated in Eq. (2) (see Fig. 1B). Thus, the generalized performance at a given parameter point x_i is indicated by the objective value Y, computed as the average

y_{i} (x_{i}) ≐ Y = \frac{L_{H} + L_{T}}{2} .

(3)

2.5. Gaussian process training

The means and variances of the Gaussian process surrogate model are updated with each sample evaluation to reflect the expected values and uncertainty, respectively, of the underlying model’s performance. We use the Bayesian optimization library BoTorch [51] to implement the outer loop of surrogate model training based on iteratively updating a updating a Gaussian process following initialization with sample data $𝒟$ . The posterior distribution $P (𝒢 𝒫 (X) ∣ 𝒟$ is then sampled from a batched MC sampling process using an acquisition function to determine the candidate parameter points $\hat{X}$ from the subspace bounded by the ranges listed in Table 1. The candidate points are selected based on predictive estimates of utility value $\hat{Y}$ (Fig. 1A) and evaluated by simulating the NeuroSwarms model to generate loss values (Eq. (2)) and objective function output Y (Eq. (3)) (Fig. 1B). Lastly, the resulting $(\hat{X}, Y)$ tuple is appended to training data $𝒟$ to update the Gaussian process for the next iteration.

The surrogate model hyperparameters were tuned by first computing the marginal log-likelihood (MLL) of the Gaussian process applied to observed parameters X and fitting hyperparameters with the limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm with simple bounds (L-BFGS-B) [47]. The fitting process provides an updated MLL for the next optimization step.

2.5.1. Convergence metrics

This hyperparameter tuning process described above was repeated until convergence according to two metrics: maximum posterior variance and minimum candidate dissimilarity. First, maximum posterior variance for training epoch M was computed following

\max Var (P (𝒢 𝒫 (x_{M}) ∣ 𝒟_{M}))

to indicate whether the Gaussian process’ posterior variance was no longer increasing and that training should cease. Second, minimum candidate dissimilarity measures the stabilization of candidate selection as an inverse cosine similarity; i.e., we calculated the metric following

\min_{i = 1}^{M - 1} [1 - \frac{x_{i} \cdot x_{M}}{‖ x_{i} ‖ \cdot ‖ x_{M} ‖}]

to confirm whether epoch M selected for similar neighborhoods of parameter points as in previous training epochs. These convergence metrics determined hyperparameter convergence and enabled the resulting Gaussian process surrogate model to efficiently adapt to the NeuroSwarms parameter space.

2.6. Parameter visualization

The low-dimensional representations produced by the uniform manifold approximation and projection (UMAP) [57] result from a locality-preserving embedding that serves to spatially cluster higher-dimensional vectors such as p-dimensional parameter points. A 2D UMAP projection allows these point clusters to be simply visualized as images or scatter plots, for which the x-axis and y-axis constitute an arbitrary coordinate frame. For UMAP scatter plots, as in Figs. 3 and 6, the marker for each point can be colored for convenient visual inspection of associated values, including vector elements or computed output. We use this visual clustering to qualitatively inspect the parameter-dependence and structure of the Gaussian process surrogate model by selecting a UMAP data point with, e.g., high performance indicated by its loss value y_i (Eq. (3)), and assessing that point’s other values in the context of its location and neighborhood relative to UMAP-based clusters.

Fig. 3. — UMAP-clustered parameter points selected by the noise-free qEI acquisition function. The dimensional reduction computed by the UMAP transformation (Section 2.6) preserves locality of neighboring parameter points. As a result, high-dimensional clusters can be revealed by scatter plots of 2D UMAP data. Each of the 10 scatter plots shows the same UMAP projection of qEI-sampled parameter points, using the same (arbitrary) 2D coordinate frame. In the first plot (top, left), the color of each point indicates the expected posterior mean of the trained Gaussian process surrogate model according to the colorbar legend to the right of the plot; e.g., a group of adjacent blue points reflects a high-performing cluster of NeuroSwarms parameters. The top-left colorbar additionally serves to provide a reference for how colors are mapped to the respective value ranges (i.e., [min, max]) specified in the label above the remaining p = 9 plots. These 9 plots show the individually sampled parameter values (cf. Table 1) associated with each UMAP point.

Fig. 6. — Anticipated future qEI-sampled parameter points. As in Fig. 3, a UMAP projection is shown across a series of plots: the top-left scatter plot assigns colors to each 2D UMAP point based on the colorbar to the right of the plot as indexed by the surrogate model’s expected posterior mean for each associated parameter point; the remaining p = 9 plots depict the same UMAP transformation except that the color of each point is mapped to the specified range (i.e., [min, max]) of the given NeuroSwarms parameter (cf. Table 1). A large batch of 500 qEI-based parameter samples is shown to facilitate visual inspection of the local structure of the trained surrogate model. For instance, these plots show that posterior sample means (top, left) have converged to similar high-performing values, and that most of the discovered system behaviors rely on short time-constants in the neural controller’s dynamics (viz., the prevalence of red data points in the three τ_* plots).

3. Results and discussion

3.1. Overview

We demonstrate Bayesian optimization methods (see Section 2.2) for tuning the parameters of a neuroscience-inspired swarming model, NeuroSwarms [33,34,39] (see Section 2.1), to find cooperative foraging behaviors for capturing multiple rewards in distinct maze environments under time pressure (see Section 2.4). We train Gaussian process surrogate models (see Section 2.5) to characterize the NeuroSwarms parameter space using noise-free (i.e., qEI) and observed sampling history-dependent (i.e., qNoisyEI) acquisition functions (see Section 2.3). Then we show how the locality-preserving dimensionality reduction provided by UMAP embeddings (see Section 2.6) can be used to evaluate the surrogate model and identify system behaviors.

3.2. Training the surrogate model for swarming performance

Small variations in the p = 9 dynamical NeuroSwarms parameters (Table 1) can substantially impact collective behaviors. Optimal parameters that allow NeuroSwarms models to accomplish generalized cooperative foraging may not be limited to a single set of parameters due to the complexity and potential degeneracy of emergent collective behaviors in a distributed multi-agent system. Thus, we constructed a simple time-pressured objective function to measure the progress of reward-capture (Section 2.4) and guide Bayesian optimization using Gaussian process surrogate models (Fig. 1A). We utilized acquisition functions to sample candidate parameter points and optimize the Gaussian process’ predictive performance compared to observed NeuroSwarms simulations (Section 2.5). We evaluated the surrogate models in two environments for each sample: a Hairpin maze and a Tunnel maze (Fig. 1B). By simultaneously assessing mazes with distinct geometries, the surrogate model optimization was allowed to find swarming and navigational dynamics resulting in time-efficient cooperative foraging that may generalize across environments.

We started training with an initial set of 24 randomly selected parameter points with corresponding simulation results. Each Gaussian process was trained by an acquisition function for selecting candidate points: q-batched Expected Improvement (qEI), q-batched Noisy Expected Improvement (qNoisyEI), or random parameter sampling (Section 2.3). Gaussian process modeling and training was implemented using BoTorch [51] and optimized with 512 MC samples over 30 training epochs (Section 2.5). We verified that the EI-based acquisition functions converged based on metrics of minimum candidate dissimilarity and maximum posterior variance (Section 2.5.1). The EI-based acquisition functions approached zero dissimilarity during training (Fig. 2A). Similarly, the maximum posterior variance for each surrogate model had converged by the end of training (Fig. 2B).

Fig. 2. — Convergence metrics and objective function values for acquisition functions across training. A+B, Training convergence metrics: minimum candidate dissimilarity (A) and maximum posterior variance (B). C+D, The training performance of Gaussian process models based on the qEI and qNoisyEI acquisition functions, compared to a baseline of random sampling, was quantified by objective function values shown as histograms of losses for the sampled parameter trajectories (C) and as the improvement in best observed values (D), where values closer to 0 indicate better performance (Eq. (2)) in the time-pressured cooperative foraging task.

We evaluated how effective each acquisition function was at finding regions of the parameter space that optimize the NeuroSwarms objective function (Eqs. (2) and (3)). Both qEI and qNoisyEI discovered more parameter points with high-performance values than random sampling (Fig. 2C). Both random sampling and the default parameters from Monaco et al. (2020) [34] were outperformed by the EI-based acquisition functions. Thus, qEI and qNoisyEI demonstrated the strongest utility improvement of best observed values during training as the NeuroSwarms parameter space was learned by the corresponding surrogate models (Fig. 2D).

3.3. Evaluating UMAP-clustering of selected parameters

Understanding the results of the above Bayesian optimization process requires a visual representation of the parameter space, yet it can be challenging to represent data with >3 dimensions. We considered that visualizing parameter points in lower dimensions could facilitate the discovery of critical surrogate model structures, including clusters of high-performing parameters that potentially yield distinct behavioral solutions to the cooperative foraging task. Thus, we used UMAP (Section 2.6) to reduce sets of 9-dimensional NeuroSwarms parameters (Table 1) into locality-preserving 2D representations. For qEI-selected parameters, we assigned colors to the resulting 2D UMAP-clustered data points according to posterior mean estimates of objective values (top, left plot) or individual parameter values (Fig. 3). The resulting visual representation in Fig. 3 shows where the highest utility (i.e., best posterior mean estimate of objective value) data points cluster into groups based on the values of NeuroSwarms parameters.

Given that qEI demonstrated the largest utility improvement (Fig. 2D) and consistently identified high-performing parameters (Fig. 2C), we consider its UMAP representation for further analysis. The qEI-based parameter samples formed two clusters of data points with the highest utility (Fig. 3). In the (top, left) posterior mean plot, we selected one of these points from the lower, left cluster and matched it with the numerical values of its associated parameters, which we subsequently evaluated in NeuroSwarms simulations.

We simulated the qEI-optimized NeuroSwarms model on both the Hairpin and Tunnel mazes (see Fig. 1B). Trajectory-trace plots for the Hairpin (Fig. 4, blue traces) depict the movement of each agent that contributed to reward capture throughout the simulation, up to the timestep at which cooperative capture of each reward goal was achieved. Likewise, trajectory traces in orange (Fig. 4) reflect the behavior of the reward-capturing agents after the reward had been captured. For example, the transition from swarming and goal-directed dynamics to post-capture exploration is depicted by the capture of Reward 3 (R3) in the third row of Fig. 4, in which a subset of agents converged on and captured R3 and immediately dispersed, thus permitting the search for and capture of subsequent reward goals. Agents recommenced exploration following reward-capture because NeuroSwarms relies on local, line-of-sight communication between agents, meaning that agent motion may not be influenced by nearby rewards if they are occluded by walls of the maze. The qEI-tuned swarms were able to quickly capture all five rewards on the Hairpin environment (t = 25.38 s), as shown in Fig. 4, whereas the original default parameters of NeuroSwarms—determined by hand-tuning as described in our previous work [34]—produced relatively slow reward capture (t = 41.02 s). Reward-capture speed using the default parameters was additionally exacerbated in the Tunnel maze (t = 175.42 s). In contrast, the qEI-tuned swarm captured all three rewards (Fig. 5) faster than the default swarm captured two rewards (t = 34.88 s). We attribute the worse performance of the hand-tuned default parameters to longer dynamical time-constants and thus slower behavioral responsivity. Thus, compared to manual parameter tuning for each maze environment, our Bayesian batch-optimization process (Section 2.3; Fig. 1A) with joint objective sampling (Section 2.4; Fig. 1B) was able to simultaneously, jointly, and efficiently discover distinct high-performing dynamical parameters for multiple mazes.

Fig. 5. — NeuroSwarms trajectories depicting reward capture in the Tunnel maze. The Tunnel maze presents an irregular arena to assess the swarm’s foraging performance given a loop-like environment with substantial geometric occlusion of visibility and passageways with large vs. constrictive (e.g., the eponymous ‘tunnel’ connecting the Southwest to the Southeast quadrants) apertures. Three reward goals are spatially distributed at maze locations indicated by gold stars (R1–R3, top-left maze plot). The 6 maze plots show agent paths before (left, blue traces) and after (right, orange traces) the cooperative reward capture (see Section 2.4) indicated by the label to the left of the plots. Additional details are as described in the caption for Fig. 4.

A key feature of our Bayesian optimizer is that the objective indirectly quantifies (i.e., as a ‘black box’ model) cooperative foraging without directly modifying NeuroSwarms’ underlying mechanisms. In general, this feature allows a task-dependent objective to evaluate multi-agent performance in collective tasks involving, e.g., social coordination or distributed consensus. In contrast to the regular but fragmented geometry of the Hairpin maze (Fig. 4), the Tunnel maze required the swarm to distribute through an irregular geometry to complete the foraging task (Fig. 5). Additionally, whereas agents were initialized at uniform random locations in the Hairpin maze, all agents in the Tunnel maze were initialized to points inside a small disc circumscribed within its Southwest quadrant. As a result, the agents rapidly capture R2 (Fig. 5, top row) and then split into subgroups to capture the remaining two rewards (Fig. 5, lower two rows). An additional challenge of the Tunnel maze is that R3 is initially visible to all agents and closer than R1, yet the tunnel constricts access to it. Conversely, R1 is initially visible and accessible, yet further away and partially occluded once agents have converged onto R2’s location. The fast capture of R1 (t = 5.46 s) vs. R3 (t = 31.78 s) reflects the characteristic time-scale differences between coordinated reward-approach trajectories and exploratory swarming trajectories, respectively. Comparing the pre-capture (blue, left) and post-capture (orange, right) trajectories for each reward (Fig. 5), the agents began using the large opening in the center of the map only once R2 and R1 were both captured. This behavioral transition suggests that exploration traded off with goal-directed exploitation by adaptively forming and regrouping subgroups of agents. Thus, distinct challenges presented by the Tunnel maze, in concert with our optimizer’s objective function definition (Section 2.4), may have induced collective behaviors that can flexibly adapt to diverse foraging problems.

3.4. Exploring the future parameter space

Trained acquisition functions can be used to predict the performance of unobserved regions of the parameter space. To test predictive selection, we generated 500 samples from the qEI acquisition function and the posterior distribution of its trained Gaussian process surrogate model. The qEI sample means from the posterior (Fig. 6, top-left plot) were similar across most data points because qEI had adapted to parameter regions with the highest likelihood of utility improvement. As in the previous Section 3.3, we selected candidate points from these anticipated future qEI parameters to simulate in the Hairpin and Tunnel mazes, but we chose points that featured mid-range parameter values, i.e., whose vector elements were not at or near the range limits of the respective parameter (Table 1). In particular, we selected parameters where the time-constants were greater than the minimum of their ranges (1 ms), constituting a parameter regime that was distinct from clusters of qEI samples which minimized their respective time-constants in response to the time-pressure imposed by our objective function (Eq. (2)). We chose these points, with corresponding simulations shown in Fig. 7, to demonstrate the distinct behavioral solutions to the foraging task that can be discovered by the same acquisition function and associated surrogate model. Trajectory-trace plots of reward-capturing agents before and after rewards were cooperatively captured on the Hairpin and Tunnel mazes show that the selected parameters resulted in slower reward capture for the Hairpin (t = 47.44 s; Fig. 7A) and Tunnel (t = 66.96 s; Fig. 7B) mazes compared with the optimized parameters in Fig. 4 (Hairpin, t = 25.38 s) and Fig. 5 (Tunnel, t = 31.78 s). Additionally, the default parameters from Monaco et al. (2020) [34] entailed strong reward-approach exploitation (e.g., κ = 6.6), but weak swarming-based exploration (e.g., σ = 2.0). This combination of behavioral forces increased the time-to-capture for all five rewards. Thus, we attribute slow reward-capture to a combination of longer dynamical time-constant parameters and exploration–exploitation mismatches. Moreover, if the energy budget of agent locomotion (e.g., speed, turning, etc.) were to be taken into account by the objective function, a slower behavioral repertoire enabled by these parameter regimes could help to minimize energetic or inefficient navigational patterns.

Fig. 7. — Example reward-capture trajectories from selected future qEI-sampled NeuroSwarms parameters. Pre-capture (left, blue traces) and post-capture (right, orange traces) pairs of trajectory-trace plots are shown relative to example reward-capture events from qEI-selected simulations in the Hairpin (A; cf. Fig. 4) and Tunnel (B; cf. Fig. 5) mazes. Parameters were selected for mid-range values (i.e., away from parameter range limits) from predictive (anticipated future) samples generated by the trained qEI-based surrogate model. Our Bayesian batch-optimizer naturally produces diverse output parameters that allow for the selection of distinct high-performing solutions and system behaviors, all of which have been equivalently constrained and guided by the high-dimensional shape of its task-dependent objective function.

4. Concluding remarks

Neuroscience-inspired learning and control methods have seen increased interest from robotics, artificial intelligence, and multi-agent control. Here, we presented a demonstration of exploring and visualizing the parameter space of a multi-agent model with complex dynamical behaviors using sample-efficient Bayesian optimization with Gaussian process surrogate models. We introduced an objective function for a spatial cooperative foraging task in NeuroSwarms simulations [34] to predict reward-capture performance across two distinct maze environments. Training the surrogate model was facilitated by the qEI and qNoisyEI acquisition functions. In particular, qEI was shown to guide optimizer trajectories towards parameter regions with high utility improvement, outperforming random sampling and manual tuning

By learning UMAP embeddings [57], we demonstrated visualization of 9-dimensional parameter points to identify and select high performing clusters of parameters. We illustrated the identification of parameters that generalized across environments by jointly evaluating the NeuroSwarms metacontroller in two distinct maze environments. Overall, our study serves as an example application of Bayesian optimization of complex multi-agent models to explore and select for complex behaviors like goal-directed spatial navigation in a system with distributed neural control.

As parameter size grows, the computational cost of the matrix inversions required to calculate updated Gaussian process parameters increases exponentially and eventually outweighs the gains in adaptive search efficiency provided by computing the acquisition function over the surrogate model to advance the sample trajectory [20]. This limitation on model dimensionality does not, in general, prohibit analysis of complex dynamics, particularly in systems of homogeneous particles, but it would reasonably detract the feasibility of Bayesian optimization for modeling systems with nontrivial heterogeneity in agent/particle behaviors. Within that moderate limit on model complexity—e.g., for p up to ~20—Bayesian optimization may facilitate adaptive and efficient computational exploration of dynamical parameter spaces, resulting in the identification of distinct and complex system behaviors.

Future work is needed to develop new controller models and critical spatial tasks to explore the capabilities of multi-agent objective functions that adapt efficiently to the characteristics of diverse environments (e.g., occlusive geometry, dynamic change, reward distribution, cue richness, etc.). We theorize that heterogeneous variation of swarm spatial structure and intertemporal coordination dynamics will be able to support a form of swarm metacognition that allows adjustment to the available goals in an environment, without initial knowledge of the goals or their locations. This approach could extend the flexibility of Bayesian optimization to operate in diverse environments and adapt efficiently to tasks with difficult or uncertain goals.

Acknowledgments

Funding for this work was provided by the National Science Foundation (NCS/FO Award No. 1835279 to GMH, KZ, KMS, and JDM), the NIH National Institute for Neurological Disorders and Stroke (NINDS R03NS109923 to KZ and JDM), and Johns Hopkins University Applied Physics Laboratory (JHUAPL) internal research and development programs (AH, GMH, and KMS). Additional support was provided to GMH by the Johns Hopkins University Kavli Neuroscience Discovery Institute and the JHUAPL Innovation and Collaboration Janney Program.

Biographies

Mr. Armin Hadzic is a computer vision researcher at the Johns Hopkins University Applied Physics Laboratory. He has a B.S. in Computer Engineering, a B.S. in Electrical Engineering, and an M.S. in Computer Science from the University of Kentucky. His research interests include developing deep learning methods to address challenges in latent information representation from multiple sources, as well as computer vision, reinforcement learning, remote sensing, and brain-inspired robotics.

Dr. Grace M. Hwang is a senior staff engineer at the Johns Hopkins University Applied Physics Laboratory in Laurel, Maryland, and faculty member of the Johns Hopkins University Kavli Neuroscience Discovery Institute. Dr. Hwang currently serves as a Program Director at the National Science Foundation in Alexandria, Virginia. She received a B.S. in Civil and Environmental Engineering from Northeastern University, an M.S. from the Massachusetts Institute of Technology, and an M.S. and Ph.D. in Biophysics and Structural Biology from Brandeis University. Her areas of expertise include biophysics, biosensors, biophotonics, brain–computer interface technologies, brain-derived artificial intelligence, computational neuroscience, and disability/rehabilitation engineering.

Dr. Kechen Zhang is an Associate Professor in the Department of Biomedical Engineering at the Johns Hopkins University School of Medicine in Baltimore, Maryland. He has a B.S. in Biophysics and Physiology, an M.S. in Neurobiology from Peking University, and a Ph.D. in Cognitive Science from the University of California San Diego. He completed a postdoctoral fellowship in computational neurobiology at the Salk Institute for Biological Studies in La Jolla, California. His research interests span the breadth of theoretical and computational neuroscience, with a particular focus on sensory coding and spatial representations related to the hippocampal system.

Dr. Kevin M. Schultz is an applied mathematician, senior scientist, and project manager in the experimental and computational physics group at the Johns Hopkins University Applied Physics Laboratory in Laurel, Maryland. He has a B.S. in Mathematics, and a B.S., M.S., and Ph.D. in Electrical and Computer Engineering from Ohio State University. His research interests include quantum characterization, control, sensing, and the application of signal processing and statistics to the domain of quantum information. Dr. Schultz’ research interests additionally span distributed control and signal processing for applications including UAV swarming, sensor networks, critical infrastructure resilience, and neuroscience.

Dr. Joseph D. Monaco is formerly a Research Associate faculty member of the Department of Biomedical Engineering at the Johns Hopkins University School of Medicine in Baltimore, Maryland. He received B.A. degrees in Cognitive Science and Mathematics from the University of Virginia, and an M.A. and Ph.D. in Neurobiology & Behavior from the Columbia University Center for Theoretical Neuroscience. His research has examined the neural computations of spatial cognition by modeling the cellular and network dynamics of the hippocampal complex. Dr. Monaco is currently conducting independent research toward the theoretical integration of dynamical neuroscience and embodied cognition to broadly advance the science of intelligence.

Footnotes

This material is based on work supported by (while serving at) the National Science Foundation. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

CRediT authorship contribution statement

Armin Hadzic: Methodology, Software, Validation, Investigation, Data curation, Writing – original draft, Visualization. Grace M. Hwang: Conceptualization, Methodology, Resources, Writing – review & editing, Supervision, Project administration, Funding acquisition. Kechen Zhang: Writing – review & editing, Project administration, Funding acquisition. Kevin M. Schultz: Conceptualization, Methodology, Formal analysis, Resources, Writing – review & editing, Supervision, Project administration, Funding acquisition. Joseph D. Monaco: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing – review & editing, Supervision, Funding acquisition.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

[1].Passino KM. Biomimicry for optimization, control, and automation. Springer Science & Business Media; 2005. [Google Scholar]
[2].Seeley TD, Morse RA, Visscher PK. The natural history of the flight of honey bee swarms. Psyche 1979;86(2–3):103–13. [Google Scholar]
[3].Boinski S, Garber PA. On the move: How and why animals travel in groups. University of Chicago Press; 2000. [Google Scholar]
[4].Couzin ID. Collective cognition in animal groups. Trends Cogn Sci 2009;13(1):36–43. 10.1016/j.tics.2008.10.002. [DOI] [PubMed] [Google Scholar]
[5].Sumpter DJ. Collective animal behavior. Princeton University Press; 2010. [Google Scholar]
[6].Herbert-Read JE, Perna A, Mann RP, Schaerf TM, Sumpter DJ, Ward AJ. Inferring the rules of interaction of shoaling fish. Proc Natl Acad Sci USA 2011;108(46):18726–31. 10.1073/pnas.1109355108. [DOI] [PMC free article] [PubMed] [Google Scholar]
[7].Beni G From swarm intelligence to swarm robotics. In: International workshop on swarm robotics. Springer; 2004, p. 1–9. 10.1007/978-3-540-30552-1_1. [DOI] [Google Scholar]
[8].Şahin E Swarm robotics: From sources of inspiration to domains of application. In: International workshop on swarm robotics. Springer; 2004, p. 10–20. 10.1007/978-3-540-30552-1_2. [DOI] [Google Scholar]
[9].Brambilla M, Ferrante E, Birattari M, Dorigo M. Swarm robotics: A review from the swarm engineering perspective. Swarm Intell 2013;7(1):1–41. 10.1007/s11721-012-0075-2. [DOI] [Google Scholar]
[10].Bayındır L A review of swarm robotics tasks. Neurocomputing 2016;172:292–321. 10.1016/j.neucom.2015.05.116. [DOI] [Google Scholar]
[11].Hasselmann K, Robert F, Birattari M. Automatic design of communication-based behaviors for robot swarms. In: International conference on swarm intelligence. Springer; 2018, p. 16–29. 10.1007/978-3-030-00533-7_2. [DOI] [Google Scholar]
[12].Brown DS, Turner R, Hennigh O, Loscalzo S. Discovery and exploration of novel swarm behaviors given limited robot capabilities. In: Distributed autonomous robotic systems. Springer; 2018, p. 447–60. 10.1007/978-3-319-73008-0_31. [DOI] [Google Scholar]
[13].Coppola M, de Croon GC. Optimization of swarm behavior assisted by an automatic local proof for a pattern formation task. In: International conference on swarm intelligence. Springer; 2018, p. 123–34. 10.1007/978-3-030-00533-7_10. [DOI] [Google Scholar]
[14].LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–44. 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
[15].Monaco JD, Rajan K, Hwang GM. A brain basis of dynamical intelligence for AI and computational neuroscience. 2021, 10.48550/arXiv.2105.07284, ArXiv preprint. [DOI] [Google Scholar]
[16].Price IC, Lamont GB. GA directed self-organized search and attack UAV swarms. In: Winter simulation conference. IEEE; 2006, p. 1307–15. 10.1109/WSC.2006.323229. [DOI] [Google Scholar]
[17].Quijano N, Passino KM. Honey bee social foraging algorithms for resource allocation: Theory and application. Eng Appl Artif Intell 2010;23(6):845–61. 10.1016/j.engappai.2010.05.004. [DOI] [Google Scholar]
[18].Lu Q, Hecker JP, Moses ME. Multiple-place swarm foraging with dynamic depots. Auton Robot 2018;42(4):909–26. 10.1007/s10514-017-9693-2. [DOI] [Google Scholar]
[19].Talamali MS, Bose T, Haire M, Xu X, Marshall JA, Reina A. Sophisticated collective foraging with minimalist agents: A swarm robotics test. Swarm Intell 2020;14(1):25–56. 10.1007/s11721-019-00176-9. [DOI] [Google Scholar]
[20].Rasmussen CE. Gaussian processes in machine learning. In: Summer school on machine learning. Springer; 2003, p. 63–71. [Google Scholar]
[21].Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. In: Pereira F, Burges C, Bottou L, Weinberger K, editors. Advances in neural information processing systems 25. Curran Associates, Inc.; 2012, p. 2951–9. [Google Scholar]
[22].Roman I, Ceberio J, Mendiburu A, Lozano JA. Bayesian optimization for parameter tuning in evolutionary algorithms. In: IEEE congress on evolutionary computation. IEEE; 2016, p. 4839–45. 10.1109/CEC.2016.7744410. [DOI] [Google Scholar]
[23].Nguyen V Bayesian optimization for accelerating hyper-parameter tuning. In: IEEE second international conference on artificial intelligence and knowledge engineering. IEEE; 2019, p. 302–5. 10.1109/AIKE.2019.00060. [DOI] [Google Scholar]
[24].Roman I, Mendiburu A, Santana R, Lozano JA. Bayesian optimization approaches for massively multi-modal problems. In: International conference on learning and intelligent optimization. Springer; 2019, p. 383–97. 10.1007/978-3-030-38629-0_31. [DOI] [Google Scholar]
[25].Kieffer E, Rosalie M, Danoy G, Bouvry P. Bayesian optimization to enhance coverage performance of a swarm of UAV with chaotic dynamics. In: International workshop on optimization and learning. 2018, http://hdl.handle.net/10993/35500. [Google Scholar]
[26].Rai A, Antonova R, Meier F, Atkeson CG. Using simulation to improve sample-efficiency of Bayesian optimization for bipedal robots. J Mach Learn Res 2019;20(1):1844–67. [Google Scholar]
[27].Berkenkamp F, Krause A, Schoellig AP. Bayesian optimization with safety constraints: Safe and automatic parameter tuning in robotics. Mach Learn 2021;20(1):1–35. 10.1007/s10994-021-06019-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Iwasa M, Iida K, Tanaka D. Hierarchical cluster structures in a one-dimensional swarm oscillator model. Phys Rev E 2010;81(4):046220. 10.1103/PhysRevE.81.046220. [DOI] [PubMed] [Google Scholar]
[29].Iwasa M, Tanaka D. Dimensionality of clusters in a swarm oscillator model. Phys Rev E 2010;81(6):066214. 10.1103/PhysRevE.81.066214. [DOI] [PubMed] [Google Scholar]
[30].O’Keeffe KP, Hong H, Strogatz SH. Oscillators that sync and swarm. Nature Commun 2017;8(1):1504. 10.1038/s41467-017-01190-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].O’Keeffe K, Bettstetter C. A review of swarmalators and their potential in bioinspired computing. In: Proceedings of the international society for optics and photonics (SPIE): Micro-and nanotechnology sensors, systems, and applications XI, Vol. 10982. 2019, p. 383–94. 10.1117/12.2518682. [DOI] [Google Scholar]
[32].O’Keeffe K, Ceron S, Petersen K. Collective behavior of swarmalators on a ring. Phys Rev E 2022;105(1):014211. 10.1103/PhysRevE.105.014211. [DOI] [PubMed] [Google Scholar]
[33].Monaco JD, Hwang GM, Schultz KM, Zhang K. Cognitive swarming: An approach from the theoretical neuroscience of hippocampal function In: Proceedings of the international society for optics and photonics (SPIE): Micro-and nanotechnology sensors, systems, and applications XI, Vol. 10982. 2019, p. 373–82. 10.1117/12.2518966. [DOI] [Google Scholar]
[34].Monaco JD, Hwang GM, Schultz KM, Zhang K. Cognitive swarming in complex environments with attractor dynamics and oscillatory computing. Biol Cybern 2020;114(2):269–84. 10.1007/s00422-020-00823-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
[35].Hwang GM, Schultz KM, Chalmers RW, Monaco JD, Zhang K. Autonomous navigation technology. U.S. Patent 11,378,975 2022.
[36].Buzsáki G Theta rhythm of navigation: Link between path integration and landmark navigation, episodic and semantic memory. Hippocampus 2005;15(7):827–40. 10.1002/hipo.20113. [DOI] [PubMed] [Google Scholar]
[37].Monaco JD, Knierim JJ, Zhang K. Sensory feedback, error correction, and remapping in a multiple oscillator model of place-cell activity. Front Comput Neurosci 2011;5:39. 10.3389/fncom.2011.00039. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Blair HT, Wu A, Cong J. Oscillatory neurocomputing with ring attractors: A network architecture for mapping locations in space onto patterns of neural synchrony. Philos Trans R Soc Lond B Biol Sci 2014;369(1635):20120526. 10.1098/rstb.2012.0526. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Monaco JD, De Guzman RM, Blair HT, Zhang K. Spatial synchronization codes from coupled rate-phase neurons. PLOS Comput Biol 2019;15(1):e1006741. 10.1371/journal.pcbi.1006741. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Samsonovich A, McNaughton BL. Path integration and cognitive mapping in a continuous attractor neural network model. J Neurosci 1997;17(15):5900–20. 10.1523/JNEUROSCI.17-15-05900.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
[41].Zhang K Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: a theory. J Neurosci 1996;16(6):2112–26. 10.1523/JNEUROSCI.16-06-02112.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
[42].Knierim JJ, Zhang K. Attractor dynamics of spatially correlated neural activity in the limbic system. Annu Rev Neurosci 2012;35:267–85. 10.1146/annurev-neuro-062111-150351. [DOI] [PMC free article] [PubMed] [Google Scholar]
[43].Lansner A Associative memory models: from the cell-assembly theory to biophysically detailed cortex simulations. Trends Neurosci 2009;32(3):178–86. 10.1016/j.tins.2008.12.002. [DOI] [PubMed] [Google Scholar]
[44].O’Hagan A Curve fitting and optimal design for prediction. J R Stat Soc Ser B Methodol 1978;40(1):1–24. 10.1111/j.2517-6161.1978.tb01643.x. [DOI] [Google Scholar]
[45].Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive black-box functions. J Global Optim 1998;13(4):455–92. 10.1023/A:1008306431147. [DOI] [Google Scholar]
[46].Osborne MA. Bayesian Gaussian processes for sequential prediction, optimisation and quadrature (Ph.D. thesis), UK: Oxford University; 2010. [Google Scholar]
[47].Zhu C, Byrd RH, Lu P, Nocedal J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Software 1997;23(4):550–60. 10.1145/279232.279236. [DOI] [Google Scholar]
[48].Williams CK. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In: Learning in graphical models. Springer; 1998, p. 599–621. 10.1007/978-94-011-5014-9_23. [DOI] [Google Scholar]
[49].MacKay DJC. Gaussian processes—a replacement for supervised neural networks? In: Lecture notes from NeurIPS. 1997. [Google Scholar]
[50].Krauth K, Bonilla EV, Cutajar K, Filippone M. AutoGP: Exploring the capabilities and limitations of Gaussian process models. 2016, 10.48550/arXiv.1610.05392, ArXiv preprint. [DOI] [Google Scholar]
[51].Balandat M, Karrer B, Jiang DR, Daulton S, Letham B, Wilson AG, Bakshy E. Botorch: A framework for efficient Monte-Carlo Bayesian optimization. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in neural information processing systems, 33. Curran Associates, Inc.; 2020, p. 21524–38. [Google Scholar]
[52].Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N. Taking the human out of the loop: A review of Bayesian optimization. Proc IEEE 2015;104(1):148–75. 10.1109/JPROC.2015.2494218. [DOI] [Google Scholar]
[53].Wilson JT, Hutter F, Deisenroth MP. Maximizing acquisition functions for Bayesian optimization. 2018, 10.48550/arXiv.1805.10196, ArXiv preprint. [DOI] [Google Scholar]
[54].Letham B, Karrer B, Ottoni G, Bakshy E, et al. Constrained Bayesian optimization with noisy experiments. Bayesian Anal 2019;14(2):495–519. 10.1214/18-BA1110. [DOI] [Google Scholar]
[55].Scott W, Frazier P, Powell W. The correlated knowledge gradient for simulation optimization of continuous parameters using Gaussian process regression. SIAM J Optim 2011;21(3):996–1026. 10.1137/100801275. [DOI] [Google Scholar]
[56].Frazier PI. A tutorial on Bayesian optimization. 2018, 10.48550/arXiv.1807.02811, ArXiv preprint. [DOI]
[57].McInnes L, Healy J, Melville J. UMAP: Uniform manifold approximation and projection for dimension reduction. 2018, 10.48550/arXiv.1802.03426, ArXiv preprint. [DOI] [Google Scholar]

[R1] [1].Passino KM. Biomimicry for optimization, control, and automation. Springer Science & Business Media; 2005. [Google Scholar]

[R2] [2].Seeley TD, Morse RA, Visscher PK. The natural history of the flight of honey bee swarms. Psyche 1979;86(2–3):103–13. [Google Scholar]

[R3] [3].Boinski S, Garber PA. On the move: How and why animals travel in groups. University of Chicago Press; 2000. [Google Scholar]

[R4] [4].Couzin ID. Collective cognition in animal groups. Trends Cogn Sci 2009;13(1):36–43. 10.1016/j.tics.2008.10.002. [DOI] [PubMed] [Google Scholar]

[R5] [5].Sumpter DJ. Collective animal behavior. Princeton University Press; 2010. [Google Scholar]

[R6] [6].Herbert-Read JE, Perna A, Mann RP, Schaerf TM, Sumpter DJ, Ward AJ. Inferring the rules of interaction of shoaling fish. Proc Natl Acad Sci USA 2011;108(46):18726–31. 10.1073/pnas.1109355108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] [7].Beni G From swarm intelligence to swarm robotics. In: International workshop on swarm robotics. Springer; 2004, p. 1–9. 10.1007/978-3-540-30552-1_1. [DOI] [Google Scholar]

[R8] [8].Şahin E Swarm robotics: From sources of inspiration to domains of application. In: International workshop on swarm robotics. Springer; 2004, p. 10–20. 10.1007/978-3-540-30552-1_2. [DOI] [Google Scholar]

[R9] [9].Brambilla M, Ferrante E, Birattari M, Dorigo M. Swarm robotics: A review from the swarm engineering perspective. Swarm Intell 2013;7(1):1–41. 10.1007/s11721-012-0075-2. [DOI] [Google Scholar]

[R10] [10].Bayındır L A review of swarm robotics tasks. Neurocomputing 2016;172:292–321. 10.1016/j.neucom.2015.05.116. [DOI] [Google Scholar]

[R11] [11].Hasselmann K, Robert F, Birattari M. Automatic design of communication-based behaviors for robot swarms. In: International conference on swarm intelligence. Springer; 2018, p. 16–29. 10.1007/978-3-030-00533-7_2. [DOI] [Google Scholar]

[R12] [12].Brown DS, Turner R, Hennigh O, Loscalzo S. Discovery and exploration of novel swarm behaviors given limited robot capabilities. In: Distributed autonomous robotic systems. Springer; 2018, p. 447–60. 10.1007/978-3-319-73008-0_31. [DOI] [Google Scholar]

[R13] [13].Coppola M, de Croon GC. Optimization of swarm behavior assisted by an automatic local proof for a pattern formation task. In: International conference on swarm intelligence. Springer; 2018, p. 123–34. 10.1007/978-3-030-00533-7_10. [DOI] [Google Scholar]

[R14] [14].LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–44. 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]

[R15] [15].Monaco JD, Rajan K, Hwang GM. A brain basis of dynamical intelligence for AI and computational neuroscience. 2021, 10.48550/arXiv.2105.07284, ArXiv preprint. [DOI] [Google Scholar]

[R16] [16].Price IC, Lamont GB. GA directed self-organized search and attack UAV swarms. In: Winter simulation conference. IEEE; 2006, p. 1307–15. 10.1109/WSC.2006.323229. [DOI] [Google Scholar]

[R17] [17].Quijano N, Passino KM. Honey bee social foraging algorithms for resource allocation: Theory and application. Eng Appl Artif Intell 2010;23(6):845–61. 10.1016/j.engappai.2010.05.004. [DOI] [Google Scholar]

[R18] [18].Lu Q, Hecker JP, Moses ME. Multiple-place swarm foraging with dynamic depots. Auton Robot 2018;42(4):909–26. 10.1007/s10514-017-9693-2. [DOI] [Google Scholar]

[R19] [19].Talamali MS, Bose T, Haire M, Xu X, Marshall JA, Reina A. Sophisticated collective foraging with minimalist agents: A swarm robotics test. Swarm Intell 2020;14(1):25–56. 10.1007/s11721-019-00176-9. [DOI] [Google Scholar]

[R20] [20].Rasmussen CE. Gaussian processes in machine learning. In: Summer school on machine learning. Springer; 2003, p. 63–71. [Google Scholar]

[R21] [21].Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. In: Pereira F, Burges C, Bottou L, Weinberger K, editors. Advances in neural information processing systems 25. Curran Associates, Inc.; 2012, p. 2951–9. [Google Scholar]

[R22] [22].Roman I, Ceberio J, Mendiburu A, Lozano JA. Bayesian optimization for parameter tuning in evolutionary algorithms. In: IEEE congress on evolutionary computation. IEEE; 2016, p. 4839–45. 10.1109/CEC.2016.7744410. [DOI] [Google Scholar]

[R23] [23].Nguyen V Bayesian optimization for accelerating hyper-parameter tuning. In: IEEE second international conference on artificial intelligence and knowledge engineering. IEEE; 2019, p. 302–5. 10.1109/AIKE.2019.00060. [DOI] [Google Scholar]

[R24] [24].Roman I, Mendiburu A, Santana R, Lozano JA. Bayesian optimization approaches for massively multi-modal problems. In: International conference on learning and intelligent optimization. Springer; 2019, p. 383–97. 10.1007/978-3-030-38629-0_31. [DOI] [Google Scholar]

[R25] [25].Kieffer E, Rosalie M, Danoy G, Bouvry P. Bayesian optimization to enhance coverage performance of a swarm of UAV with chaotic dynamics. In: International workshop on optimization and learning. 2018, http://hdl.handle.net/10993/35500. [Google Scholar]

[R26] [26].Rai A, Antonova R, Meier F, Atkeson CG. Using simulation to improve sample-efficiency of Bayesian optimization for bipedal robots. J Mach Learn Res 2019;20(1):1844–67. [Google Scholar]

[R27] [27].Berkenkamp F, Krause A, Schoellig AP. Bayesian optimization with safety constraints: Safe and automatic parameter tuning in robotics. Mach Learn 2021;20(1):1–35. 10.1007/s10994-021-06019-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Iwasa M, Iida K, Tanaka D. Hierarchical cluster structures in a one-dimensional swarm oscillator model. Phys Rev E 2010;81(4):046220. 10.1103/PhysRevE.81.046220. [DOI] [PubMed] [Google Scholar]

[R29] [29].Iwasa M, Tanaka D. Dimensionality of clusters in a swarm oscillator model. Phys Rev E 2010;81(6):066214. 10.1103/PhysRevE.81.066214. [DOI] [PubMed] [Google Scholar]

[R30] [30].O’Keeffe KP, Hong H, Strogatz SH. Oscillators that sync and swarm. Nature Commun 2017;8(1):1504. 10.1038/s41467-017-01190-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] [31].O’Keeffe K, Bettstetter C. A review of swarmalators and their potential in bioinspired computing. In: Proceedings of the international society for optics and photonics (SPIE): Micro-and nanotechnology sensors, systems, and applications XI, Vol. 10982. 2019, p. 383–94. 10.1117/12.2518682. [DOI] [Google Scholar]

[R32] [32].O’Keeffe K, Ceron S, Petersen K. Collective behavior of swarmalators on a ring. Phys Rev E 2022;105(1):014211. 10.1103/PhysRevE.105.014211. [DOI] [PubMed] [Google Scholar]

[R33] [33].Monaco JD, Hwang GM, Schultz KM, Zhang K. Cognitive swarming: An approach from the theoretical neuroscience of hippocampal function In: Proceedings of the international society for optics and photonics (SPIE): Micro-and nanotechnology sensors, systems, and applications XI, Vol. 10982. 2019, p. 373–82. 10.1117/12.2518966. [DOI] [Google Scholar]

[R34] [34].Monaco JD, Hwang GM, Schultz KM, Zhang K. Cognitive swarming in complex environments with attractor dynamics and oscillatory computing. Biol Cybern 2020;114(2):269–84. 10.1007/s00422-020-00823-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] [35].Hwang GM, Schultz KM, Chalmers RW, Monaco JD, Zhang K. Autonomous navigation technology. U.S. Patent 11,378,975 2022.

[R36] [36].Buzsáki G Theta rhythm of navigation: Link between path integration and landmark navigation, episodic and semantic memory. Hippocampus 2005;15(7):827–40. 10.1002/hipo.20113. [DOI] [PubMed] [Google Scholar]

[R37] [37].Monaco JD, Knierim JJ, Zhang K. Sensory feedback, error correction, and remapping in a multiple oscillator model of place-cell activity. Front Comput Neurosci 2011;5:39. 10.3389/fncom.2011.00039. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] [38].Blair HT, Wu A, Cong J. Oscillatory neurocomputing with ring attractors: A network architecture for mapping locations in space onto patterns of neural synchrony. Philos Trans R Soc Lond B Biol Sci 2014;369(1635):20120526. 10.1098/rstb.2012.0526. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] [39].Monaco JD, De Guzman RM, Blair HT, Zhang K. Spatial synchronization codes from coupled rate-phase neurons. PLOS Comput Biol 2019;15(1):e1006741. 10.1371/journal.pcbi.1006741. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] [40].Samsonovich A, McNaughton BL. Path integration and cognitive mapping in a continuous attractor neural network model. J Neurosci 1997;17(15):5900–20. 10.1523/JNEUROSCI.17-15-05900.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] [41].Zhang K Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: a theory. J Neurosci 1996;16(6):2112–26. 10.1523/JNEUROSCI.16-06-02112.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] [42].Knierim JJ, Zhang K. Attractor dynamics of spatially correlated neural activity in the limbic system. Annu Rev Neurosci 2012;35:267–85. 10.1146/annurev-neuro-062111-150351. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] [43].Lansner A Associative memory models: from the cell-assembly theory to biophysically detailed cortex simulations. Trends Neurosci 2009;32(3):178–86. 10.1016/j.tins.2008.12.002. [DOI] [PubMed] [Google Scholar]

[R44] [44].O’Hagan A Curve fitting and optimal design for prediction. J R Stat Soc Ser B Methodol 1978;40(1):1–24. 10.1111/j.2517-6161.1978.tb01643.x. [DOI] [Google Scholar]

[R45] [45].Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive black-box functions. J Global Optim 1998;13(4):455–92. 10.1023/A:1008306431147. [DOI] [Google Scholar]

[R46] [46].Osborne MA. Bayesian Gaussian processes for sequential prediction, optimisation and quadrature (Ph.D. thesis), UK: Oxford University; 2010. [Google Scholar]

[R47] [47].Zhu C, Byrd RH, Lu P, Nocedal J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Software 1997;23(4):550–60. 10.1145/279232.279236. [DOI] [Google Scholar]

[R48] [48].Williams CK. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In: Learning in graphical models. Springer; 1998, p. 599–621. 10.1007/978-94-011-5014-9_23. [DOI] [Google Scholar]

[R49] [49].MacKay DJC. Gaussian processes—a replacement for supervised neural networks? In: Lecture notes from NeurIPS. 1997. [Google Scholar]

[R50] [50].Krauth K, Bonilla EV, Cutajar K, Filippone M. AutoGP: Exploring the capabilities and limitations of Gaussian process models. 2016, 10.48550/arXiv.1610.05392, ArXiv preprint. [DOI] [Google Scholar]

[R51] [51].Balandat M, Karrer B, Jiang DR, Daulton S, Letham B, Wilson AG, Bakshy E. Botorch: A framework for efficient Monte-Carlo Bayesian optimization. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in neural information processing systems, 33. Curran Associates, Inc.; 2020, p. 21524–38. [Google Scholar]

[R52] [52].Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N. Taking the human out of the loop: A review of Bayesian optimization. Proc IEEE 2015;104(1):148–75. 10.1109/JPROC.2015.2494218. [DOI] [Google Scholar]

[R53] [53].Wilson JT, Hutter F, Deisenroth MP. Maximizing acquisition functions for Bayesian optimization. 2018, 10.48550/arXiv.1805.10196, ArXiv preprint. [DOI] [Google Scholar]

[R54] [54].Letham B, Karrer B, Ottoni G, Bakshy E, et al. Constrained Bayesian optimization with noisy experiments. Bayesian Anal 2019;14(2):495–519. 10.1214/18-BA1110. [DOI] [Google Scholar]

[R55] [55].Scott W, Frazier P, Powell W. The correlated knowledge gradient for simulation optimization of continuous parameters using Gaussian process regression. SIAM J Optim 2011;21(3):996–1026. 10.1137/100801275. [DOI] [Google Scholar]

[R56] [56].Frazier PI. A tutorial on Bayesian optimization. 2018, 10.48550/arXiv.1807.02811, ArXiv preprint. [DOI]

[R57] [57].McInnes L, Healy J, Melville J. UMAP: Uniform manifold approximation and projection for dimension reduction. 2018, 10.48550/arXiv.1802.03426, ArXiv preprint. [DOI] [Google Scholar]

PERMALINK

Bayesian optimization of distributed neurodynamical controller models for spatial navigation

Armin Hadzic

Grace M Hwang

Kechen Zhang

Kevin M Schultz

Joseph D Monaco

Abstract

1. Introduction