Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Dec 20;16:3174. doi: 10.1038/s41598-025-33076-6

Mixing of a binary passive particle system using smart active particles

Thomas Jacob 1,2, Siddhant Mohapatra 1, Rajalingam A 1, Sam Mathew 3, Pallab Sinha Mahapatra 1,
PMCID: PMC12830705  PMID: 41422311

Abstract

The controlled activity of active entities interacting with a passive environment can generate emergent system-level phenomena, positioning such systems as promising platforms for potential downstream applications in targeted drug delivery, adaptive and reconfigurable materials, microfluidic transport, and related fields. The present work aims to realise an optimal mixing of two segregated species of passive particles by introducing a small fraction of active particles (Inline graphic by composition) with adaptive and intelligent behaviour, directed by a trained Artificial Neural Network-based agent. While conventional run-and-tumble particles can induce mixing in the system, the smart active particles demonstrate enhanced performance, achieving faster and more efficient mixing. Interestingly, an optimal mixing strategy doesn’t involve a uniform dispersion of active particles in the domain, but rather limiting their motion to an eccentrically placed zone of activity, inducing a global rotational motion of the passive particles about the system centre. A transition in the directionality of the passive particles’ motion is observed along the radius towards the centre, likening the active particles’ motion to an ellipse-shaped void with a defined surface speed. Situated at the intersection of active matter and machine learning, this work highlights the potential of integrating adaptive learning frameworks into traditional models of active matter.

Keywords: Mixing, Active matter, Reinforcement learning

Subject terms: Engineering, Mathematics and computing, Physics

Introduction

Active matter encompasses a broad class of intrinsically non-equilibrium systems, forming a ubiquitous part of natural as well as synthetic systems across various size scales. These systems consist of units which constantly dissipate energy and, through local interactions, give rise to emergent system-spanning behavioural patterns, otherwise termed collective behaviour. Such system behaviour has been reported in several experimental forays on natural13 and artificial4,5 systems. The 1990 s marked the development of the mathematical machinery to understand and interpret active systems, with pioneering works by Vicsek et al.6 and Toner and Tu7. These seminal works led to an increased interest in the numerical modelling of active systems and their applications to a wide range of fields, encompassing the life sciences, physical sciences, safety science, and econometrics. Reviews by Ramaswamy et al.8, Vicsek and Zaefiris9, Marchetti et al.10, and more recently, by Shaebani et al.11, and Gompper et al.12 provide a detailed account of theoretical paradigms and numerical approaches that have shaped the current research in active matter. In the biological world, evolutionary pressures driven by basic functional needs are considered central to the emergence of social behaviour in organisms. Some instances of this social behaviour among macroscale organisms include flocking/swarming for predator confusion1316, navigation of complex environments17, schooling for hydrodynamic efficiency18, and herding for coordinated escape19. In the microscopic world, collective behaviour is often driven by physico-chemical processes or biological cues, some examples of which are the chemotaxis of Escherichia coli causing swarming20, the mechanotaxis of Pseudomonas aeruginosa through twitching mobility leading to the formation of rafts21, and the chemical gradient-driven active nematic motion of rod-like cells resulting in the formation of lanes22. Irrespective of the size scales, the collective behaviour observed in active systems stems purely from localised interactions involving proximate entities. In such dynamic systems, natural or artificial, the active entities often interact with passive structures/entities2327. The scope and versatility of the phenomena deriving from such interactions, most notably clustering, homogeneous mixing, phase separation, and active transport, have prompted extensive studies on active-passive mixtures of varying size ratios28,29, activity3032, and particle proportions33. Experiments introducing a minute fraction (Inline graphic by area) of active particles in a dense aggregation of passive colloids (varying between Inline graphic to Inline graphic by area) demonstrated that even a highly limited active component can significantly alter the structure and dynamics of the system34. Microscopic parameters, such as particle activity and interaction strengths, as well as macroscopic properties such as particle concentration, have been found to affect the emergent behaviour in active and active-passive systems, raising pertinent questions about the parameter space in which such systems operate.

Recent advancements in Machine Learning (ML) have provided a bottom-up approach to understanding active systems. Supervised as well as unsupervised learning techniques have been deployed in various applications involving active matter, such as pattern recognition and classification35,36, predictive modelling37,38, optimal navigation strategies39, and swarm optimisation40. Several studies demonstrate the efficacy of reinforcement learning (RL) in discerning the optimal parameter set of active systems driven towards a specific objective. In the purview of control on a particle scale, a widely studied problem is that of optimal navigation of the particle towards a target location under different environmental conditions such as complex flow fields39,4143, spatially varying motility landscapes44, physical or potential barriers45, and stochasticity in the surroundings46. Some studies have also focused on controlling the particle motion through selective activation using attraction/repulsion forces47, optical exposure40,48, among others. Therefore, a system of randomly interacting particles could be trained to be more efficient in achieving the desired goals by controlling one or more parameters of the system- usually speed and/or direction.

In the literature, Q-learning, a reinforcement learning algorithm, stands out as a prominent tool for training relatively less complex problems such as grid world navigation49, maze solving50, and path planning39,45,51, often using Q-tables for state-action mapping. However, with problems requiring a larger state action space (see the Methods section for technical details), increased dimensionality, and more extensive exploration, the learning process becomes progressively complex and cannot be handled by Q-tables. Artificial Neural Networks (ANNs) must be employed to handle the increased complexity. Deep Q Networks (DQN) is one of the most successful initial implementations using a Deep Neural Network (DNN), with a value-based off-policy algorithm. Since the advent of DQN, several algorithms have emerged to train deep neural networks effectively. These algorithms can be classified into three categories: value-based (e.g., DQN, DDQN), policy-based (e.g., REINFORCE), and actor-critic-based (e.g., PPO, SAC). Each of these algorithms, while being successful for certain problems, also presents its own set of challenges. Due to the decision-making role played by the agent when integrated with active matter systems (such as propulsion/directional control), the selection of the type of algorithm is contingent upon the characteristics of the action space and the nature of the control problem. Discrete action scenarios are the primary application of value-based algorithms, such as DQN. Policy-based as well as hybrid (combining policy and value) algorithms can generally handle both continuous and discrete action spaces. It is noteworthy that value-based algorithms have been predominantly applied to single-particle manoeuvring. However, in systems with multiple particles, it is preferable to switch to hybrid algorithms which use an actor-critic framework.

In the current study, a small number of active particles are introduced into a two-dimensional binary athermal bath, consisting of two initially segregated passive species, inside a circular confinement. The objective of the active particles is to agitate the passive particles and achieve an efficient mixture of the species. The motion of the active particles is controlled by an agent that has been trained through reinforcement learning to maximise an objective function (defined as a mixing index of the passive system). The present work involves passive particles driven by direct impact with active particles, in contrast to the existing literature40,47,48, which employed various methods to induce self-propulsion in the targeted particles. Additionally, the physical properties of both the passive species are assumed to be similar, for the sake of simplicity; therefore, they are differentiated only by colour. The complexity of the problem arises during the training stage, due to the large number of possible actions in any given system state, especially when the objective function is a macroscopic quantity that encompasses the entire system. Considering the highly non-linear nature of mapping the system states to probable actions, the method demands a non-conventional approach involving Artificial Neural Networks (ANN). Using Reinforcement Learning (RL) concepts, in which an agent learns to achieve an objective through repeated experiences, the current work demonstrates that a minute fraction of active particles, trained to perform simple discrete actions, suffices to efficiently mix a binary passive system. The next section outlines the numerical methodology, including the equations of motion governing both active and passive particles, as well as the reinforcement learning framework. It also describes the method of quantifying mixing among passive particles, which is further used to define the objective function of the optimisation problem. The subsequent sections pertain to the mixing performance of active particles following run-and-tumble dynamics, before discussing and comparing it to the mixing efficacy when employing active particles controlled by a trained RL agent.

Numerical methodology

Simulation environment

Monodisperse passive disc-shaped particles are uniformly distributed inside a confined two-dimensional circular domain as represented in Fig. 1(a). All particles have the same radius r, and the ratio of the domain radius R to the particle radius is defined as the radius ratio Inline graphic. A circular confinement is selected to avoid entrapment of passive particles in the corners of standard rectangular/square domains. The wall of the bounded system consists of one layer of immovable circular discs of the same size as the interior particles. The packing in the system is denoted by an area fraction Inline graphic, where N is the number of interior particles. It is to be noted that the packing fraction takes into account the passive particles only, while the active particles are represented in numbers, Inline graphic.

Fig. 1.

Fig. 1

Panel (a) showcases the initial distribution of particles in the confined circular domain. The active particles are coloured red, while the two passive particle species are coloured blue and orange. The wall particles are demarcated by their circumference in black. Ratio of radius of domain to that of a passive particle Inline graphic, while the packing fraction of passive particles Inline graphic, and the number of active particles Inline graphic. Panel (b) displays the self-propulsion drive of any active particle i in the direction Inline graphic at speed v, while panel (c) illustrates the inter-particle repulsion drive Inline graphic and Inline graphic acting on particles i and j, respectively, on overlap. The strength of this drive scales linearly with the extent of overlap. Panel (d) illustrates the neighbourhood selection scheme for any particle i in the calculation of the mixing index. A metric-based neighbourhood selection is used, when any particle within a distance of Inline graphic from the centre of i is considered a neighbour of particle i. According to the example in panel (d), particle i has seven neighbours (Inline graphic), four of which are of the same type (blue) as i, while the rest are of the opposite type (orange; Inline graphic). Hence the number fraction of opposite species for particle i is Inline graphic.

The governing equations of motion of the active and the passive particles are delineated in Eqs. 1 and 2, adopted from the model used by Henkes et al.52. The current model assumes non-inertial and athermal particles and is valid for particles moving slowly or in highly viscous environments (such that inertial effects are negligible in comparison to viscous damping). The active particles are acted upon by two drives: the self-propulsion drive and the inter-particle repulsion drive (see Eq. 1).

graphic file with name d33e575.gif 1

Here, Inline graphic is the position of active particle i with respect to the origin, v is the self-propulsion speed in the direction Inline graphic, Inline graphic is the repulsive force exerted on particle i due to overlap with any particle j, Inline graphic is the translational mobility, k is the coefficient of the repulsive force, Inline graphic and Inline graphic are the radii of the particles i and j, respectively. Inline graphic is the distance between the centres of particles i and j. The direction of motion of the active particle i is controlled by the term Inline graphic, where Inline graphic is the angle subtended by the desired direction of propulsion of the particle i with the x-axis (see Fig. 1(b)). Inline graphic can change continuously over the range Inline graphic, or assume discrete angles based on prescribed movement criteria. By modulating the direction of propulsion Inline graphic, different types of motion can be observed in the active particles, such as run-and-tumble (RT)53,54, run-reverse55,56, and directed migration57,58.

graphic file with name d33e690.gif 2

The equation for passive particles (Eq. 2) differs from their active counterpart due to their inability to self-propel. Therefore, the passive particles are subjected only to the inter-particle repulsion drive. In the current work, all particles are assumed to be athermal (assuming negligible thermal diffusivity). The interaction of the interior particles (irrespective of activity) with the wall particles occurs through a repulsion drive similar to the one mentioned in Eqs. 1 and 2. When an interior particle i overlaps with a wall particle w, the former experiences a body force Inline graphic if Inline graphic, where Inline graphic and Inline graphic are the radii of the particles, and Inline graphic is the Euclidean distance between their centres.

To observe mixing, the passive particles are initially segregated along the diameter of the circular domain, while the active particles are uniformly distributed (see Fig. 1(a)). In the present study, unless otherwise specified, three active particles (Inline graphic) are used to agitate a relatively dense binary passive aggregation (Inline graphic). All particles are assumed to be of unit radius, and any length dimensions presented are scaled against it. To keep the system concise and manageable, the radius ratio Inline graphic is fixed at 15, Inline graphic, and Inline graphic. The results presented in the next section pertain to Inline graphic time steps or longer (to observe typical long-time behaviour). The upcoming subsection explains the dynamics of run-and-tumble particles (RTPs) and their efficacy in mixing the two passive species.

Run-and-tumble particles

Run-and-tumble (RT) is one of the prevalent mechanisms of bacterial locomotion, often observed in species such as Escherichia coli, Bacillus subtilis, and Salmonella enterica. RT motion is characterised by ballistic “runs”, interspersed with sudden directional changes (tumbles). Empirical evidence of the locomotion of these microbes suggests an exponential distribution of run duration, with tumble angles uniformly distributed within a certain range. However, for artificial systems, the tumbling range can be Inline graphic, which induces complete randomness. The active particles, having been modelled as slow-moving robots, can be thought to behave as RT particles, with run durations sampled from an exponential distribution and tumbling angles sampled from a uniform distribution. The tumbling is also assumed to be instantaneous (the time scale of the tumbling event is much smaller than that of the run event and can be neglected). This enhances the exploration probability over the entire domain due to the synergy between persistent runs and random tumbles. The dynamics of these particles are governed by Eqs.1 and 2, and the run duration is governed by Eq.3, where Inline graphic is the sampled run duration and Inline graphic is the mean run duration.

graphic file with name d33e800.gif 3

In the current work, run-and-tumble particles are employed to behave as a randomised mixer of the passive species, with the mean run duration serving as a primary control parameter. Such an analysis provides a base case for defining programmed mixing functionality with the help of Reinforcement Learning (RL) later on.

Mixing index (Inline graphic)

As previously discussed, the active particles serve to agitate the passive species in the system, thereby promoting their mixing. Therefore, the mixing in the system has to be properly quantified. From the literature, various methods exist to quantify the mixing of a binary particle system59,60, and the choice of method must be both simple and effective. Due to the inherent large fluctuations observed in quantifying mixing when dealing with grid-based methods, this approach was ruled out. Mixing can also be computed based on Principal Component Analysis (PCA)59; however, it is computationally expensive. In granular mixing, several methods have been developed to assess the extent of mixing in a binary particle system. In the current work, a relatively straightforward and computationally efficient method has been used to quantify the mixing index Inline graphic, as elucidated in Eq. 4.

graphic file with name d33e832.gif 4

Here, Inline graphic, where Inline graphic is the number of passive particles of the opposite species and n is the total number of passive particles surrounding the particle i, (both Inline graphic and n are counted exclusive of the particle i), within a fixed radius Inline graphic from the centre of i (see Fig. 1(d)). Only passive particle species are considered to compute the mixing index. In an ideal homogeneous mixture, each passive particle is surrounded by an equal number of neighbours of the same and the opposing species, resulting in Inline graphic for each particle. Therefore, a normalisation factor of 1/2 has been applied such that the mixing index Inline graphic for an ideal homogeneous mixture. Accordingly, the value of Inline graphic can vary from 0 for a completely unmixed system (where a considerable space separates the two species) to a value close to 1 in a well-mixed system. In the current work, the initial positional configuration consists of certain particles at the interface of the two species with non-zero Inline graphic, therefore, resulting in a non-zero Inline graphic at Inline graphic. Due to the inherent dynamic nature of the system, fluctuations of the mixing index can occur, even in well-mixed systems. In the current study, the passive particles form a relatively dense system (Inline graphic), and Eq. 4 is applied within a neighbourhood defined by a radius of Inline graphic. Choosing a large Inline graphic could lead to erroneous reporting of a well-mixed state, even in the presence of small particle clusters of the same species. Conversely, a small Inline graphic could result in too few or no neighbours in low-density regions, weakening the statistical accuracy of the mixing analysis.

Reinforcement learning framework

The Reinforcement Learning (RL) framework is employed to find an optimal mixing strategy to efficiently mix the passive system by guiding active particles. Here, optimality refers to the shortest path (lowest simulation time) to a high value of mixing index Inline graphic. As discussed earlier, active particles can be controlled by adjusting their run duration Inline graphic and direction of motion Inline graphic in steps of Inline graphic. Figure 2(a) illustrates the RL setup consisting of two key components: the agent and the environment, interacting through three quantities: the state (observation), the action, and the reward. The agent is the decision maker, and the environment is the system whose state the agent attempts to modulate. This modulation is made possible by communicating action variables Inline graphic to the environment, based on the current state of the environment at any time t (also called the observation Inline graphic). Due to the implementation of the action, the environment transitions to a new state Inline graphic. This transition to the new state concurrently results in a reward value Inline graphic, which is usually based on Inline graphic and Inline graphic. The reward value is quintessentially a quantification of the effect of the action Inline graphic on the state Inline graphic. The result is the formation of a tuple Inline graphic. In the current study, the mixing index Inline graphic (refer to the previous subsection) constitutes the reward function. The coordinates for the active as well as the passive particles, form the observation space in the current RL framework - the input is in the form of a flattened 1D array {Inline graphic}. Cyclical interaction between the agent and environment gives rise to a collection of state-action-reward tuples. A large set of possible actions at any state corresponds to a large number of state-action combinations for the active particles, increasing the difficulty of training the agent to guide the environment to an optimal state.

Fig. 2.

Fig. 2

Panel (a) illustrates the interaction between the reinforcement learning agent and the environment. Here, the agent refers to an Artificial Neural Network (ANN). Action Inline graphic transforms the state of the environment from Inline graphic to Inline graphic along with a feedback in the form of a reward Inline graphic. Panel (b) demonstrates the different steps Inline graphic by which the RL agent can modulate the direction of motion for the active particles. The orientation of the active particles can be modulated in steps of (i) Inline graphic (4 possible directions of motion), (ii) Inline graphic (8 possible directions of motion), and (iii) Inline graphic (16 possible directions of motion).

In the current RL training module, the orientations of the active particles (Inline graphic) are set to be the action variables communicated by the agent (see Fig. 2(a)). Although the directional orientations can ideally be set as continuous variables in the range Inline graphic, a discrete action space is chosen for the ease of implementation (significantly faster training due to a smaller action space and negligible difference in the final state of the environment). As a result, when a certain action input is provided to an active particle, the particle continues to move in that direction until it receives another action input from the RL agent. In the purview of the current work, an RL agent-controlled active particle is termed a Smart Active Particle (SAP), and the terms “environment” and “system” are used interchangeably. To allow adequate time for the SAPs to interact and mix the passive particles, the agent transmits action variables to the SAPs every Inline graphic time steps (also known as the run duration for the SAPs, inspired by the RT dynamics). The runs are assumed to be ballistic, without any rotational diffusivity, similar to the run-and-tumble particles. After every run duration, each SAP tumbles instantaneously to a new orientation. The tumbled (new) orientations of the active particles are the action variables transmitted from the RL agent, assuming no randomness, for simplicity and ease of training the neural network. As the action space is discrete, the tumbling can occur in steps of Inline graphic, bringing the number of possible actions for an SAP at any state to Inline graphic. Therefore, the number of combinations of the possible actions for Inline graphic SAPs in any state amounts to Inline graphic. The lower the value of Inline graphic or the higher the number of SAPs, the larger the action space, with Inline graphic being the greater influence of the two. Figure 2(b) illustrates the three values of Inline graphic tested in the current work, corresponding to 4, 8, and 16 possible action directions for each SAP, respectively. Additionally, taking into account a fairly dense passive aggregation, the observation space turns out to be large enough to warrant the use of an ANN with multiple hidden layers for representing the RL agent (with a shared network for both policy and value functions). The parameters of the ANN are randomly initialised and are updated throughout the training process. A detailed description of the RL implementation is discussed in Sec. SI-1 and SI-2 of Supplementary Information, and visualised in Fig. S1 of Supplementary Information.

A MultiLayer Perceptron (MLP) policy with a ReLU activation function is selected, from a wide variety of ANNs, to represent the agent in the RL implementation. ReLU provides nonlinearity to the neural network, enabling it to learn complex mappings between action probabilities and state inputs. Proximal Policy Optimisation (PPO) is used to optimise the parameters of the MLP policy due to its stability in updating network parameters from a clipped surrogate objective function, its capability to manage discrete action spaces, the sample efficiency61, and the effectiveness in addressing physical problems related to active matter and optimal navigation. The policy update is executed using the PPO algorithm with the primary aim of maximising the cumulative reward (or minimising a loss function). The approach aggregates a sequence of tuples prior to policy update throughout the experience collection process. The whole reinforcement learning framework is constructed within the OpenAI Gym interface with the stable-baselines3 package in Python. Among the several hyperparameters in PPO, the learning rate is one of the most crucial in influencing the training efficacy in terms of convergence and speed. It controls the extent to which the policy’s weights and parameters are adjusted in response to the computed policy gradient. Preliminary simulations suggest using a learning rate lower than the default value of Inline graphic to prevent unstable parameter updates in our policy (see Sec. SI-3, Fig. S2, and Tables S1 and S2 of the Supplementary Information for details on the preliminary simulations and the selection of hyperparameters). Following a detailed investigation of the impact of Inline graphic and Inline graphic in conjunction with the chosen learning rate, and taking into account several hidden layer configurations for the MLP policy, a neural network with hidden layer sizes of (512, 256, 64) has been selected to represent the agent (refer to Sec. SI-3 and Figs. S3 through S5 of Supplementary Information for details).

Results

Prior to examining the dynamics of SAPs and their efficacy in mixing the segregated passive system, it is pertinent to consider a baseline case without learning, where mixing arises exclusively from the stochastic driving of the active particles. Run-and-tumble particles serve as a useful model for examining self-propelled motion, particularly in the context of translational applications employing microrobots. Therefore, the RT dynamics serve as a paradigm for the development of SAPs, which can be programmed to perform certain tasks in designated environments. The upcoming subsection analyses the influence of active particles following RT dynamics on the passive species in a confined circular domain, which is subsequently contrasted against SAPs.

Mixing by run-and-tumble particles

Inspired by several microscopic organisms, run-and-tumble (RT) dynamics is one of the most widely accepted models describing the motion of active particles and can serve as a benchmark for highlighting the mixing performance of active particles. Figure 3(a) provides a visual depiction of a typical mixing scenario where the active particles follow RT dynamics interacting with the stratified binary passive system over a period of time (Inline graphic time steps). The presented case involves a mean run duration of Inline graphic time steps and the angle of tumble uniformly distributed in Inline graphic. A gradual mixing can be observed in the series of time progression snapshots of the system, arising from interactions among the active particles and the two passive species. Figure 3(b) showcases the spatial mapping of the locations of the RT particles, coloured by time stamp. It is clear that all the active particles explore the entire domain, sans the region immediately adjacent to the wall. Simulating random tumbles, as is the case with RT particles, the active particles barely perturb the passive particles along the wall, which explains the absence of tumbling events adjacent to the wall. Although Fig. 3 demonstrates the features of a representative case, qualitatively similar behaviour is observed across multiple realisations and different simulation parameters. To understand the effect of the RT particles on the behaviour of the passive particles, the trajectories of three passive particles from different locations (centre, off-centre, and next to the wall) are showcased in Figs. 4(a(i–iii)) over a sufficiently long simulation time (Inline graphic time steps). Following the time signature in the form of the colour gradient, it is observed that the passive particles are driven through the domain in a random fashion. To bolster these observations, Fig. 4(b) illustrates the kernel density estimation (KDE) plot using a Gaussian kernel to estimate a smooth function of the probability of finding passive particles at any location given a long time window (Inline graphic time steps). The plot takes into consideration the positions of the passive particles sampled over a long timescale, and the Gaussian kernel then estimates a smooth probability density from the discrete particle data by placing Gaussian kernels centred at each sample point. The resulting value at any location is proportional to the probability of finding any passive particle in that region over the given time window. The highest probability is observed close to the domain boundary, primarily due to the increased residence time of particles near the wall (see Sec. SI-4 and Fig. S6 of the Supplementary Information for more details), as any passive particles next to the boundary require a strong inward push to re-enter the bulk. The only condition that permits an inward push is when active particles are wedged between peripheral passive particles and the wall particles, an event that is rarely observed in the current work. For the most part, active particles are observed to move these passive particles along the wall. The probability data also reaffirms an overall stochastic motion for the passive particles in the majority of the domain (similar probability values pointing towards a uniform distribution). The displacement of the passive particles is more pronounced away from the boundaries; hence, the ones travelling along the confinement exhibit minimal radial shifts. An extensive quantitative measurement of the mixing performance of the RT particles has been reported for a range of mean run durations in the upcoming subsection.

Fig. 3.

Fig. 3

Panel (a) displays snapshots of the system at different time instances Inline graphic. The active particles are coloured red, while the two passive species are coloured orange and blue, respectively. Panel (b) showcases the locations (coloured by time stamp) of the active particles following run-and-tumble (RT) dynamics. (Note: Both the panels involve depiction of a representative simulation, and qualitatively similar features are observed across multiple realisations.).

Fig. 4.

Fig. 4

The trajectories of three representative passive particles are displayed to showcase their long-term behaviour, on interaction with run-and-tumble active particles (time evolution is represented through a colour gradient). The passive particles are chosen based on their initial positions in the domain: (a-i) located somewhat off-centre, (a-ii) centrally located, and (a-iii) located next to the wall. The starting point of these passive particles is marked with red disks, while the final position (i.e., Inline graphic time steps) is marked with cyan disks. The arrows represent the direction of motion of the particles at any point. Panel (b) presents the kernel density estimate (KDE) plot of the spatial probability distribution of positions occupied by the passive particles over a long time window (Inline graphic), demonstrating a higher residence time near the wall (see Sec. SI-4 and Fig. S6 of the Supplementary Information).

Mixing using trained active particles

Although conventional RT-based simulations can achieve mixing in the segregated passive system, there is scope for improvement, particularly with stochastic inputs that induce directional changes. Therefore, an RL framework, as explained in the Methods section, is employed to train an agent (ANN) to make informed orientational decisions for the SAPs’ movements, thereby promoting mixing.

The RL training architecture (refer to the Methods section and Sec. SI-1 and SI-2 of Supplementary Information) requires two primary inputs for the smart active particles (SAPs): the tumble step Inline graphic, and the run duration Inline graphic. In the current section, the training of the agent is carried out with Inline graphic and Inline graphic time steps. The reasoning behind the selection of these specific values has been described in Sec. SI-3, and Figs. S3 through S5 of Supplementary Information. If a SAP is controlled by a trained agent, it is referred to as a trained SAP (TSAP). If controlled by a non-trained agent, the SAP is referred to as a non-trained SAP (NTSAP). Figure 5 qualitatively compares the progression in mixing among the passive particles on interaction with the NTSAPs (top panel) and the TSAPs (bottom panel). In the case of the NTSAPs, the mixing is random and doesn’t follow any pattern. However, in the case of TSAPs, the snapshots reveal the presence of a global clockwise swirl among the passive particles (more details are provided in the upcoming sections). Another point to note is the enhanced positional shift of passive particles near the wall in the presence of TSAPs, indicating improved mixing capabilities of TSAPs compared to those of NTSAPs. Irrespective of training, the mixing also results in the formation of empty spaces (voids) in the system (see Inline graphic of Fig. 5).

Fig. 5.

Fig. 5

Time progression of the mixing of the two passive species enabled by smart active particles (SAPs) controlled by a non-trained agent (top panel) and a trained agent (bottom panel) is presented through snapshots of the domain. The SAPs (coloured red) can only move in the four cardinal directions (Inline graphic), and are set to tumble instantaneously every Inline graphic time steps. (Note: For the bottom panel, the RL agent has been trained for Inline graphic agent-environment interactions. Both the panels are representative of fifty test episodes.).

To glean further insights into the mixing phenomena, the locations of the tumbling events of the SAPs have been mapped as shown in Fig. 6(a). It is evident that a non-trained RL agent prescribes actions which tend to move the active particles randomly across the domain. On the contrary, TSAPs move strategically, focusing on a part of the domain to enhance mixing. The trajectories for the TSAPs are presented over a short time window (Inline graphic time steps) in Fig.6(b), which supports the hypothesis about the constrained motion of the SAPs, characterised by sharp turns (Inline graphic). The perturbations on the path arise due to collisions with the passive particles and other TSAPs. However, once they reach the periphery, they follow the wall for a duration due to the boundedness and pertaining values of Inline graphic and Inline graphic. A quantitative measure of mixing performance is computed using a mixing index defined a priori in the Methods section. Figure 6(c) compares the temporal variation in the mixing index Inline graphic (ensemble average over fifty test episodes), when mixing is carried out using NTSAPs and TSAPs for a tumble step Inline graphic and a run duration Inline graphic. In both cases, the mixing index increases with time until it reaches a steady-state value. However, a trained agent demonstrates an enhanced mixing of the binary passive system, based on the absolute mixing index value and the time taken to reach it. The steady-state mixing index for the trained RL agent attains a value around Inline graphic, whereas that for the non-trained case stands at Inline graphic. The inset of Fig. 6(c) illustrates the temporal variation in Inline graphic for mixing using RT particles with different mean run durations Inline graphic, each curve averaged over a hundred realisations. The inset clearly depicts a saturation value close to Inline graphic for the RT particles with mean run durations beyond Inline graphic, thereby defining an upper limit to the ability of the RT particles to induce mixing in the binary passive system. The use of a trained RL agent can augment the mixing of the passive system beyond these RTP-based limits, with a substantially simplified approach (restricting the motion of the SAPs to discrete steps in the four cardinal directions).

Fig. 6.

Fig. 6

Panel (a) illustrates the locations of the tumbling events of the SAPs in the case of (i) a non-trained SAP (NTSAP), and (ii) a trained SAP (TSAP) for a representative test episode. Panel (b) demonstrates the trajectories of three TSAPs for a period of Inline graphic time steps, to further highlight the motion of the TSAPs being constrained to only a section of the domain, unlike the NTSAPs. Panel (c) delineates the temporal evolution of the mixing index Inline graphic averaged over fifty test episodes, comparing mixing induced by NTSAPs and that by TSAPs. The inset in panel (c) represents the change in Inline graphic with time Inline graphic for RT particles with different mean run durations Inline graphic (refer to Eq. 3). All the panels are depicted for parameters Inline graphic and Inline graphic. (Note: NTSAP refers to a SAP receiving inputs from a non-trained RL agent, while TSAP is a SAP controlled by a trained RL agent.).

Probability of finding dissimilar neighbours

The previous sections have been concerned with the “microscopic” behaviour of the passive particles; however, it is equally important to understand the ”macroscopic” implications of the actions undertaken by the TSAPs. A well-mixed system, in the context of this passive system, can be postulated to have an equal number of particles of similar and opposite species/type surrounding each passive particle (within the radius Inline graphic). To test this hypothesis, the probability distribution of Inline graphic, represented as Inline graphic, is computed where the SAPs are not taken into account. In a perfectly mixed dense system with a large number of particles, the histogram should peak at Inline graphic with Inline graphic, and Inline graphic assuming a value of 0.

Figure 7 compares the temporal evolution in the probability density function of Inline graphic, starting from the same initial distribution (Fig. 7(a)), when using NTSAPs (Fig. 7(b)) and TSAPs (Fig. 7(c)) for mixing the passive system. Due to the initially stratified state of the system, all passive particles apart from those at the interface of the two passive species have neighbours of a similar type. Therefore, Inline graphic has a peak (Inline graphic) at Inline graphic at Inline graphic. As time progresses, the SAPs start agitating and mixing the system, causing the peak of Inline graphic to shift towards higher Inline graphic values. However, a clear distinction can be made between the performance of the NTSAPs and the TSAPs, at Inline graphic time steps. Figure 7(c), illustrating mixing effected by TSAPs, shows a higher probability of finding more dissimilar neighbours around each passive particle, compared to a system where mixing is actuated by NTSAPs. At Inline graphic time steps, when the mixing index Inline graphic has saturated (see Fig. 6(c)), the system involving TSAPs outperforms its counterpart, as evidenced by a higher probability of finding more dissimilar particles (the peaks in Figs. 7(b-ii) and 7(c-ii) occur at Inline graphic and Inline graphic, respectively). The distribution of Inline graphic in the system with the TSAPs closely resembles a Gaussian distribution with a mean at Inline graphic. Furthermore, on fitting the observed histograms at Inline graphic to a Gaussian distribution, the kurtosis points at platykurtic distributions (negative excess kurtosis). A significantly higher negative excess kurtosis is observed in the case of the system with NTSAPs (excess kurtosis of Inline graphic) compared to that of TSAPs (excess kurtosis of Inline graphic), indicating a flatter distribution in the former.

Fig. 7.

Fig. 7

The temporal evolution of the probability distribution of Inline graphic (i.e., the ratio of the passive neighbours of dissimilar type/species Inline graphic to the total of passive neighbours n surrounding any passive particle within the radius Inline graphic) with (a) initial probability distribution at Inline graphic is compared for (b) a system involving NTSAPs, and (c) a system involving TSAPs. The time stamps during evolution are at Inline graphic, Inline graphic, and Inline graphic, respectively. The systems using TSAPs for mixing perform better than their counterpart, as shown by the higher probability of finding more dissimilar neighbours at Inline graphic and Inline graphic, and by the sharper nature of the curve for the bottom panel at Inline graphic.

Trajectories for the passive particles with TSAPs

Visualisation and critical analysis of the dynamics of the passive particles is crucial to understanding the mixing performance of SAPs. Figure 5 provides some insights into the motion of the passive particles as a whole. Visual inspections of the passive system displayed a clockwise swirl about the domain centre in all passive particles except those near the wall (which undergo motion in the anti-clockwise direction). To highlight the aforementioned rotational motion, three passive particles were chosen from three different locations (similar to the selection used in Fig. 4) to study their trajectories. Episodic simulations with long time scales are carried out employing TSAPs for mixing until Inline graphic time steps. The trajectories of the three representative passive particles are plotted in Fig. 8(a). These particles are observed to move in roughly circular trajectories around the domain centre, until they reach the area where the TSAPs are active (see Fig. 5(a-ii)). In the episode presented in Fig. 8, the passive particles adjacent to the wall move in a counter-clockwise (CCW) fashion, whereas those in the interior regions of the domain perform clockwise (CW) motion. It is also observed that the particles in the path adjacent to the outermost path (band 2 in Fig. 8(b)) have a propensity to execute motion in either of the two directions (CW or CCW). Such behaviour can be explained by envisioning the motion of these particles as mimicking that of particles trapped between shear layers (moving in opposite directions). To corroborate our findings, basic computational fluid dynamics simulations are carried out to recreate a similar mixing behaviour. The Eulerian simulations utilise two identical fluids, differentiated only by colour, with similar initial conditions to the Lagrangian system. Fluid motion (and thereby, mixing) was induced by imposing a constant surface speed on the periphery of an elliptical disk with dimensions close to the operating region of the SAPs. The computational results exhibit analogous mixing dynamics to those observed in the particle system and are further discussed in Sec. SI-5 and Fig. S7 of Supplementary Information.

Fig. 8.

Fig. 8

Panel (a) illustrates the motion trajectories for three representative passive particles selected based on their initial positions: (a-i) off-centre, (a-ii) centre, and (a-iii) adjacent to the wall. The colour bar indicates the time until a maximum of Inline graphic time steps. The red and the cyan disks represent the initial and final positions of the passive particle in each panel. (Note: Each trajectory is expressed from the positions of the passive particle in steps of Inline graphic time steps.) Panel (b) features the trajectory (for Inline graphic time steps) for a passive particle originating from the periphery (red disk). The entire domain is divided into six concentric regions (excluding the central area), numbered Inline graphic, starting from the outermost band, with each concentric band having a radial width of 2r. Panel (c) showcases the distribution of average angular velocity Inline graphic of the passive particles in each of the bands presented in panel (b) in a system mixed using TSAPs. Each simulation is carried out until Inline graphic time steps, and the distribution takes into account data from fifty test episodes, considering all passive particles. The black dashed line represents zero angular velocity (Inline graphic).

The behaviour of the passive particles reported in Fig. 8(a) is representative of the passive particles in all the episodes. This indicates an optimal mixing strategy where the active particles induce a circular motion around the domain centre to promote mixing among initially segregated passive particles. Moreover, the passive particles also have a transverse (radial) component of motion as they move along the circular path. Switching between the concentric paths is predominant in the region where the TSAPs are active (see Fig. 5(a-ii)) due to frequent collisions with the moving SAPs. To analyse the directionality of the circular motion of the passive particles over a complete trajectory, the domain has been divided into several bands (see Fig. 8(b); bands are numbered from 1 to 6, 1 being the outermost). Following the trajectory data from numerous passive particles over multiple episodes (7 distinct concentric trajectories are observed on superposing all the passive positional data), each band is assumed to have a radial width of 2r. Figure 8(b) illustrates the trajectory of a typical particle starting from the periphery (marked by the red disk). The density distribution of the angular velocities within these bands is elucidated in Fig. 8(c). In the inner bands (bands 3–6), the particle velocities are predominantly in the CW direction (assumed to be negative angular velocity Inline graphic; majority of the distribution has Inline graphic). In band 2, the passive particles are almost equally likely to have a CW or CCW bias in their motion, while in band 1, there is a clear bias towards CCW motion (Inline graphic). The combined CW and CCW motion in the different regions of the domain culminates in an efficient mixing of the two passive species. Altering the initial distribution of particles produces similar clockwise (CW) and counter-clockwise (CCW) motion patterns, although the dominant direction of motion (CW or CCW) for particles in the interior and exterior regions can change depending on these initial conditions. Additionally, changes in initial conditions can influence whether a band favours CW or CCW motion.

Effect of Inline graphic and Inline graphic on mixing performance

The run duration Inline graphic and the angle of the tumble Inline graphic (Inline graphic) are deemed to be important input parameters governing the active particle dynamics and closely associated with the policy updates during the training of the RL agent. Figure 9 outlines the effect of Inline graphic and Inline graphic in the temporal variation of the mixing index Inline graphic, when the binary passive system is mixed using trained SAPs. The training of the RL agent for all the combinations of Inline graphic and Inline graphic is carried out using a neural network (NN) architecture of hidden layer size (512, 256, 64) and a learning rate of Inline graphic. Other hyperparameters used in the PPO algorithm are set to their default values as defined in the stable-baselines3 package (see Table S1 of Supplementary Information). It is evident from any of the panels (each panel represents a different Inline graphic) that Inline graphic, despite allowing for minimal directional options for the movement of the active particles, is sufficient to induce mixing. Considering a lower Inline graphic, such as Inline graphic or Inline graphic, increases the angular action space for each active particle, thereby increasing the complexity of the training stage of the RL agent. Meanwhile, the corresponding improvement in the mixing index is nominal in most cases, with adverse effects being observed in certain cases (see Fig. 9(a)).

Fig. 9.

Fig. 9

The effect of the tumble step Inline graphic and run duration Inline graphic of the SAPs is demonstrated with respect to mixing actuated by a trained RL agent. The temporal variation in the mixing index Inline graphic is illustrated at different parametric combinations of Inline graphic and Inline graphic. Panels (a) through (d) present the mixing index variation for different run durations Inline graphic time steps, Inline graphic time steps, Inline graphic time steps, and Inline graphic time steps, respectively. In each panel, Inline graphic is varied in steps of Inline graphic, Inline graphic, and Inline graphic (blue, orange, and green curves, respectively). (Note: Each curve is averaged over fifty test episodes. The learning rate for the training of the RL agent is set to Inline graphic, and other hyperparameters for the RL algorithm are set to default values defined in the stable-baselines3 library (refer to Table S1 of Supplementary Information).

On the other hand, the Inline graphic values used in the training of the RL agent are selected on the basis of the simulations involving the RT particles (see inset of Fig. 6(c)). It is apparent from Fig. 9(a) that too low a run duration can lead to sub-optimal mixing even using trained SAPs, if finer Inline graphic values are chosen. However, with Inline graphic, the mixing index Inline graphic saturates to similar peak values (Inline graphic), irrespective of the run duration of the SAPs, except at Inline graphic for which Inline graphic (value at Inline graphic). At Inline graphic and Inline graphic, the influence of Inline graphic is trivial, as all the curves saturate to a similar value following a similar trend (see Fig. 9(b–c)). However, the variation in the mixing index is smoother in the latter one, and a higher mixing index is obtained at an early stage (hence, a marginal improvement in mixing). Further increase in Inline graphic is found to be detrimental to the mixing performance of the SAPs.

Discussion

The current study demonstrates the use of Reinforcement Learning (RL) to train and manage a collection of smart active particles (SAPs) in a high-dimensional state-action space to achieve an optimal mixing between two initially segregated passive species. The forces exerted by the active particles on collision drive the passive particles. The mixing among the two passive species is quantified through a mixing index Inline graphic, which is a function of the number fraction of passive particles of opposite species, calculated locally. Extensive simulations show that a discrete action space with SAP movement limited to only four directions is sufficient for efficiently mixing the passive species. Even with the intrinsic nonlinearity resulting from inter-particle collisions, the MLP policy, employed to capture the best state-action pairs, along with the PPO algorithm (which optimises the RL agent parameters), effectively mimics the coordination among the SAPs. The current work primarily highlights the motion of the SAPs leading to an optimally mixed passive mixture. The operating area of the SAPs in such cases is observed to be fairly restricted to a small area offset from the domain centre, rather than dispersing across the domain, promoting a circular motion among the passive particles about the domain centre. An analogous Eulerian model involving an elliptical-shaped mixer (an ellipse-shaped void with a constant surface speed) positioned eccentrically in the domain yields a similar area fraction distribution with two immiscible fluids as that observed among the passive particles (refer to Sec. SI-5 and Fig. S7 of the Supplementary Information). To demonstrate the efficacy of the active particles controlled by a trained agent, the mixing induced by a set of run-and-tumble (RT) particles with similar tumble angles and run durations has been analysed as a baseline study. From the analysis, the peak Inline graphic increased from 0.9 for an RT-based system to 0.96 for an RL-based system in a much shorter time frame than the former (Inline graphic reduction in time to reach Inline graphic), highlighting an improved mixing of the binary passive system. It is also noted that the dynamics of the RT particles provide findings identical to those generated by NTSAPs (see Sec. SI-6 and Fig. S8 of Supplementary Information). Apart from the mixing index, the mixing performance of the active particles has also been quantified through the probability distribution Inline graphic, Inline graphic being the ratio of the number of particles of opposite species to the total number of particles around each passive particle in a specified radius Inline graphic. Using trained SAPs to mix the system yields a Gaussian probability distribution with a peak value close to Inline graphic, which corresponds to an equal number of particles of opposite and similar types. At the same time, the distribution in the case of non-trained SAPs exhibits a flatter peak with more negative excess kurtosis, exhibiting an inability to optimise the mixing in the system. To generalise the findings of the work, training and testing of multiple systems with different initial positional distributions for the active and passive particles have been conducted. The obtained results strongly support the applicability of the same RL framework, regardless of the initial particle positions (refer to Sec. SI-7 and Fig. S9 of the Supplementary Information).

As the reward system defined in the current work focuses on maximising the mixing index, the RL agent finds an optimum at Inline graphic. However, a more intricate reward system with additional or different goals can also be tried to assess the effectiveness of integration. A multi-objective reward with suitable weights that balances complementary objectives, such as spatial dispersion, rate of mixing, and penalties for same-species clustering, is a worthwhile future scope to examine. The SAPs used in the current work have constant self-propulsion speeds, which can be added as another parameter to be controlled through continuous or discrete inputs from the agent. It is important to note, however, that any such endeavour will require major improvements to the neural network architecture used in the current work. This is also true when a larger number of SAPs are used in tandem, which would otherwise lead to subpar improvement or inferior performance compared to non-trained SAPs. Moreover, all the particles interact through an inter-particle collision drive. The current approach of using SAPs can be integrated with existing macroscale techniques involving electric or magnetic fields6264. In such cases, attractive and repulsive forces can also be incorporated in the particle dynamics, and the strength of the field can be controlled in tandem with the motion of the SAPs to hasten the mixing process. Such a setup can also be used to segregate a mixed system.

The system described in the current work can be experimentally realised by substituting the SAPs with either light-activated Janus particles (externally controlled) or micro-robots (with either on-board or off-board actuation mechanisms), which exhibit comparable motion characteristics. Fluorescent labelling can be used for real-time tagging/tracking of the motion of the passive species and to distinguish between two or more species. An advantage of the current RL framework is its ease of adaptation to the intended active-passive experimental realisations, owing to its modular design. However, experimental realisations of such RL frameworks often struggle with issues related to real-time processing of large volumes of data, which can result in delayed feedback and inaccurate state inferences. Additionally, data may be noisy or partial, which makes it challenging for RL algorithms to accurately determine the true state of the environment required to make the best decisions. Leveraging a trained RL model from a minimalist model, as implemented here, can serve as a robust initialiser, significantly accelerating optimisation in the real environment compared to training a policy entirely from scratch.

Furthermore, the findings reported in this work have significant implications for the study of controllable active matter systems. Due to the generality of the micro-robotic system, the same system can be applied to various fields by simply redefining the agent’s input parameters, reward function, and governing equations, while selecting the appropriate neural networks and hyperparameters. By adjusting the objective function, these smart active particles can be used in a variety of fields, such as targeted drug delivery65, microswimmer-based mixing66, smart navigation in colloidal and complex environments67, and granular mixing and segregation68, where active particles interact and regulate passive entities. To enhance the realism of the current work, a logical next step could be the incorporation of polydispersity in the particle properties. In this scenario, the RL agent must be aware of both the particle positions and their sizes, as well as their identities. Such modification, accompanied by the refinement of the reward mechanism and the agent architecture, can further the functionality and practicality of the SAPs. Augmenting the system to three dimensions could enhance its versatility; however, this would come at the cost of substantially increasing mobility, interactions, and trajectories, necessitating three-dimensional spatial information for training the agent, which in turn translates to much higher computational costs. Finally, by offering an adaptable framework that can be adjusted for active particles involved in multi-body interactions, the current work promotes the integration of conventional active matter theory with powerful reinforcement learning techniques.

Supplementary Information

Acknowledgements

PSM acknowledges the V. Ganesan Faculty Fellowship received from IIT Madras.

Author contributions

TJ, SiM and PSM contributed to conceiving and developing the idea. TJ, SiM and SaM developed the code. TJ ran the simulations and wrote the initial draft of the manuscript. RA developed the analogical model. TJ and SiM post-processed the data. All authors reviewed the manuscript. PSM funded the work.

Funding

This research is partially supported by the Indian Institute of Technology Madras [Sanction No. SB22231233MEETWO008509, RF22230093MERFIR008846].

Data availability

The simulation code and the data for making the plots can be found at https://github.com/s-m-sys/mix_with_SAPs.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-33076-6.

References

  • 1.Ballerini, M. et al. Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study. Proc. Natl. Acad. Sci.105, 1232–1237 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Cook, C. N. et al. Individual learning phenotypes drive collective behavior. Proc. Natl. Acad. Sci.117, 17949–17956 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Mattingly, H. H. & Emonet, T. Collective behavior and nongenetic inheritance allow bacterial populations to adapt to changing environments. Proc. Natl. Acad. Sci.119, e2117377119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Duarte, M. et al. Evolution of collective behaviors for a real swarm of aquatic surface robots. PloS One11, e0151834 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Mezey, D. et al. Purely vision-based collective movement of robots. npj Robotics3, 11 (2025).
  • 6.Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I. & Shochet, O. Novel type of phase transition in a system of self-driven particles. Phys. Rev. Lett.75, 1226 (1995). [DOI] [PubMed] [Google Scholar]
  • 7.Toner, J. & Tu, Y. Long-range order in a two-dimensional dynamical xy model: how birds fly together. Phys. Rev. Lett.75, 4326 (1995). [DOI] [PubMed] [Google Scholar]
  • 8.Ramaswamy, S. The mechanics and statistics of active matter. Annu. Rev. Condens. Matter Phys.1, 323–345 (2010). [Google Scholar]
  • 9.Vicsek, T. & Zafeiris, A. Collective motion. Phys. Reports517, 71–140 (2012). [Google Scholar]
  • 10.Marchetti, M. C. et al. Hydrodynamics of soft active matter. Rev. Mod. Phys.85, 1143–1189 (2013). [Google Scholar]
  • 11.Shaebani, M. R., Wysocki, A., Winkler, R. G., Gompper, G. & Rieger, H. Computational models for active matter. Nat. Rev. Phys.2, 181–199 (2020). [Google Scholar]
  • 12.Gompper, G. et al. The 2025 motile active matter roadmap. J. Physics: Condens. Matter37, 143501 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hogan, B. G., Hildenbrandt, H., Scott-Samuel, N. E., Cuthill, I. C. & Hemelrijk, C. K. The confusion effect when attacking simulated three-dimensional starling flocks. Royal Soc. Open Sci.4, 160564 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mohapatra, S. & Mahapatra, P. S. Confined system analysis of a predator-prey minimalistic model. Sci. Reports9, 11258 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Olson, R. S., Hintze, A., Dyer, F. C., Knoester, D. B. & Adami, C. Predator confusion is sufficient to evolve swarming behaviour. J. The Royal Soc. Interface10, 20130305 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mohapatra, S. & Sinha Mahapatra, P. Behavioural response of prey to repeated attacks by non-coordinating predators. Sci. Reports15, 22977 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Cavagna, A., Queirós, S. D., Giardina, I., Stefanini, F. & Viale, M. Diffusion of individual birds in starling flocks. Proc. Royal Soc. B: Biol. Sci.280, 20122484 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Saadat, M. et al. Hydrodynamic advantages of in-line schooling. Bioinspiration & Biomimetics16, 046002 (2021). [DOI] [PubMed] [Google Scholar]
  • 19.Cressman, R. & Garay, J. The effects of opportunistic and intentional predators on the herding behavior of prey. Ecology92, 432–440 (2011). [DOI] [PubMed] [Google Scholar]
  • 20.Colin, R., Drescher, K. & Sourjik, V. Chemotactic behaviour of escherichia coli at high cell density. Nat. Commun.10, 5329 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kühn, M. J. et al. Mechanotaxis directs pseudomonas aeruginosa twitching motility. Proc. Natl. Acad. Sci.118, e2101759118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Memarian, F. L. et al. Active nematic order and dynamic lane formation of microtubules driven by membrane-bound diffusing motors. Proc. Natl. Acad. Sci.118, e2117107118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wu, X.-L. & Libchaber, A. Particle diffusion in a quasi-two-dimensional bacterial bath. Phys. Rev. Lett.84, 3017 (2000). [DOI] [PubMed] [Google Scholar]
  • 24.Volpe, G., Buttinoni, I., Vogt, D., Kümmerer, H.-J. & Bechinger, C. Microswimmers in patterned environments. Soft Matter7, 8810–8815 (2011). [Google Scholar]
  • 25.Angelani, L., Maggi, C., Bernardini, M., Rizzo, A. & Di Leonardo, R. Effective interactions between colloidal particles suspended in a bath of swimming cells. Phys. Rev. Lett.107, 138302 (2011). [DOI] [PubMed] [Google Scholar]
  • 26.Valeriani, C., Li, M., Novosel, J., Arlt, J. & Marenduzzo, D. Colloids in a bacterial bath: simulations and experiments. Soft Matter7, 5228–5238 (2011). [Google Scholar]
  • 27.Schwarz-Linek, J. et al. Phase separation and rotor self-assembly in active particle suspensions. Proc. Natl. Acad. Sci.109, 4052–4057 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Dolai, P., Simha, A. & Mishra, S. Phase separation in binary mixtures of active and passive particles. Soft Matter14, 6137–6145 (2018). [DOI] [PubMed] [Google Scholar]
  • 29.Semwal, V., Kumar, A., Singh, J. P. & Mishra, S. Dynamics of active run and tumble and passive particles in binary mixture. The Eur. Phys. J. Special Top.233, 3185–3192 (2024). [Google Scholar]
  • 30.McCandlish, S. R., Baskaran, A. & Hagan, M. F. Spontaneous segregation of self-propelled particles with different motilities. Soft Matter8, 2527–2534 (2012). [Google Scholar]
  • 31.Gokhale, S., Li, J., Solon, A., Gore, J. & Fakhri, N. Dynamic clustering of passive colloids in dense suspensions of motile bacteria. Phys. Rev. E105, 054605 (2022). [DOI] [PubMed] [Google Scholar]
  • 32.Dhar, T. & Saintillan, D. Active transport of a passive colloid in a bath of run-and-tumble particles. Sci. Reports14, 11844 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hrishikesh, B. & Mani, E. Collective dynamics of active circle-swimming lennard-jones particles. Phys. Chem. Chem. Phys.24, 19792–19798 (2022). [DOI] [PubMed] [Google Scholar]
  • 34.Kümmel, F., Shabestari, P., Lozano, C., Volpe, G. & Bechinger, C. Formation, compression and surface melting of colloidal clusters by active particles. Soft Matter11, 6187–6191 (2015). [DOI] [PubMed] [Google Scholar]
  • 35.Mototake, Y.-i., Ishida, S., Maruyama, N. & Ikegami, T. Topological data analysis of large swarming dynamics. In 37th Conference on Neural Information Processing Systems (NeurIPS 2023) (2023).
  • 36.Nishida, K. & Hotta, K. Robust cell particle detection to dense regions and subjective training samples based on prediction of particle center using convolutional neural network. PLoS One13, e0203646 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Dulaney, A. R. & Brady, J. F. Machine learning for phase behavior in active matter systems. Soft Matter17, 6808–6816 (2021). [DOI] [PubMed] [Google Scholar]
  • 38.Colen, J. et al. Machine learning active-nematic hydrodynamics. Proc. Natl. Acad. Sci.118, e2016708118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Colabrese, S., Gustavsson, K., Celani, A. & Biferale, L. Flow navigation by smart microswimmers via reinforcement learning. Phys. Rev. Lett.118, 158004 (2017). [DOI] [PubMed] [Google Scholar]
  • 40.Löffler, R. C., Panizon, E. & Bechinger, C. Collective foraging of active particles trained by reinforcement learning. Sci. Reports13, 17055 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Gunnarson, P., Mandralis, I., Novati, G., Koumoutsakos, P. & Dabiri, J. O. Learning efficient navigation in vortical flow fields. Nat. Commun.12, 7143 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Liebchen, B. & Löwen, H. Optimal navigation strategies for active particles. Europhys. Lett.127, 34003 (2019). [Google Scholar]
  • 43.Nasiri, M. & Liebchen, B. Reinforcement learning of optimal active particle navigation. New J. Phys.24, 073042 (2022). [Google Scholar]
  • 44.Monderkamp, P. A., Schwarzendahl, F. J., Klatt, M. A. & Löwen, H. Active particles using reinforcement learning to navigate in complex motility landscapes. Mach. Learn. Sci. Technol.3, 045024 (2022). [Google Scholar]
  • 45.Schneider, E. & Stark, H. Optimal steering of a smart active particle. Europhys. Lett.127, 64003 (2019). [Google Scholar]
  • 46.Nasiri, M., Loran, E. & Liebchen, B. Smart active particles learn and transcend bacterial foraging strategies. Proc. Natl. Acad. Sci.121, e2317618121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Schildknecht, D., Popova, A. N., Stellwagen, J. & Thomson, M. Reinforcement learning reveals fundamental limits on the mixing of active particles. Soft Matter18, 617–625 (2022). [DOI] [PubMed] [Google Scholar]
  • 48.Falk, M. J., Alizadehyazdi, V., Jaeger, H. & Murugan, A. Learning to control active matter. Phys. Rev. Res.3, 033291 (2021). [Google Scholar]
  • 49.Antony, S., Roy, R. & Bi, Y. Q-learning: Solutions for grid world problem. In Artificial Intelligence XL: 43rd SGAI International Conference on Artificial Intelligence, AI 2023, Cambridge, UK, December 12–14, 2023, Proceedings, 14381, 266 (Springer Nature, 2023).
  • 50.Tijsma, A. D., Drugan, M. M. & Wiering, M. A. Comparing exploration strategies for q-learning in random stochastic mazes. In 2016 IEEE symposium series on computational intelligence (SSCI), 1–8 (IEEE, 2016).
  • 51.Mirzakhanloo, M., Esmaeilzadeh, S. & Alam, M.-R. Active cloaking in stokes flows via reinforcement learning. J. Fluid Mech.903, A34 (2020). [Google Scholar]
  • 52.Henkes, S., Fily, Y. & Marchetti, M. C. Active jamming: Self-propelled soft particles at high density. Phys. Rev. E84, 040301 (2011). [DOI] [PubMed] [Google Scholar]
  • 53.Lee, M., Szuttor, K. & Holm, C. A computational model for bacterial run-and-tumble motion. The J. Chem. Phys. 150 (2019). [DOI] [PubMed]
  • 54.Junot, G. et al. Run-to-tumble variability controls the surface residence times of e. coli bacteria. Phys. Rev. Lett. 128, 248101 (2022). [DOI] [PubMed]
  • 55.Großmann, R., Peruani, F. & Bär, M. Diffusion properties of active particles with directional reversal. New J. Phys.18, 043009 (2016). [Google Scholar]
  • 56.Guseva, K. & Feudel, U. Advantages of run-reverse motility pattern of bacteria for tracking light and small food sources in dynamic fluid environments. J. Royal Soc. Interface22, 20250037 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Schakenraad, K. et al. Topotaxis of active brownian particles. Phys. Rev. E101, 032602 (2020). [DOI] [PubMed] [Google Scholar]
  • 58.Bhattacharjee, T., Amchin, D. B., Alert, R., Ott, J. A. & Datta, S. S. Chemotactic smoothing of collective migration. Elife11, e71226 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Doucet, J., Bertrand, F. & Chaouki, J. A measure of mixing from lagrangian tracking and its application to granular and fluid flow systems. Chem. Eng. Res. Des.86, 1313–1321 (2008). [Google Scholar]
  • 60.Bhalode, P. & Ierapetritou, M. A review of existing mixing indices in solid-based continuous blending operations. Powder Technol.373, 195–209 (2020). [Google Scholar]
  • 61.Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. arXiv preprintarXiv:1707.06347 (2017).
  • 62.Diwakar, N. M., Kunti, G., Miloh, T., Yossifon, G. & Velev, O. D. Ac electrohydrodynamic propulsion and rotation of active particles of engineered shape and asymmetry. Curr. Opin. Colloid & Interface Sci.59, 101586 (2022). [Google Scholar]
  • 63.Shields, C. W. & Velev, O. D. The evolution of active particles: toward externally powered self-propelling and self-reconfiguring particle systems. Chem3, 539–559 (2017). [Google Scholar]
  • 64.Harraq, A. A., Choudhury, B. D. & Bharti, B. Field-induced assembly and propulsion of colloids. Langmuir38, 3001–3016 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Park, B.-W., Zhuang, J., Yasa, O. & Sitti, M. Multifunctional bacteria-driven microswimmers for targeted active drug delivery. ACS Nano11, 8910–8923 (2017). [DOI] [PubMed] [Google Scholar]
  • 66.Bailey, M. R. et al. Low efficiency of janus microswimmers as hydrodynamic mixers. Phys. Rev. E110, 044601 (2024). [DOI] [PubMed] [Google Scholar]
  • 67.Yang, Y., Bevan, M. A. & Li, B. Efficient navigation of colloidal robots in an unknown environment via deep reinforcement learning. Adv. Intell. Syst.2, 1900106 (2020). [Google Scholar]
  • 68.Agrawal, N. K. & Mahapatra, P. S. Alignment-mediated segregation in an active-passive mixture. Phys. Rev. E104, 044610 (2021). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The simulation code and the data for making the plots can be found at https://github.com/s-m-sys/mix_with_SAPs.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES