Efficient collective swimming by harnessing vortices through deep reinforcement learning

Siddhartha Verma; Guido Novati; Petros Koumoutsakos

doi:10.1073/pnas.1800923115

. 2018 May 21;115(23):5849–5854. doi: 10.1073/pnas.1800923115

Efficient collective swimming by harnessing vortices through deep reinforcement learning

Siddhartha Verma ^a,¹, Guido Novati ^a,¹, Petros Koumoutsakos ^a,²

PMCID: PMC6003313 PMID: 29784820

Significance

Can fish reduce their energy expenditure by schooling? We answer affirmatively this longstanding question by combining state-of-the-art direct numerical simulations of the 3D Navier–Stokes equations with reinforcement learning, using recurrent neural networks with long short-term memory cells to account for the unsteadiness of the flow field. Surprisingly, we find that swimming behind a leader is not always associated with energetic benefits for the follower. In turn, we demonstrate that fish can improve their sustained propulsive efficiency by placing themselves at appropriate locations in the wake of other swimmers and intercepting their wake vortices judiciously. The results show that autonomous, “smart” swimmers may exploit unsteady flow fields to reap substantial energetic benefits and have promising implications for robotic swarms.

Keywords: fish schooling, deep reinforcement learning, autonomous navigation, energy harvesting, recurrent neural networks

Abstract

Fish in schooling formations navigate complex flow fields replete with mechanical energy in the vortex wakes of their companions. Their schooling behavior has been associated with evolutionary advantages including energy savings, yet the underlying physical mechanisms remain unknown. We show that fish can improve their sustained propulsive efficiency by placing themselves in appropriate locations in the wake of other swimmers and intercepting judiciously their shed vortices. This swimming strategy leads to collective energy savings and is revealed through a combination of high-fidelity flow simulations with a deep reinforcement learning (RL) algorithm. The RL algorithm relies on a policy defined by deep, recurrent neural nets, with long–short-term memory cells, that are essential for capturing the unsteadiness of the two-way interactions between the fish and the vortical flow field. Surprisingly, we find that swimming in-line with a leader is not associated with energetic benefits for the follower. Instead, “smart swimmer(s)” place themselves at off-center positions, with respect to the axis of the leader(s) and deform their body to synchronize with the momentum of the oncoming vortices, thus enhancing their swimming efficiency at no cost to the leader(s). The results confirm that fish may harvest energy deposited in vortices and support the conjecture that swimming in formation is energetically advantageous. Moreover, this study demonstrates that deep RL can produce navigation algorithms for complex unsteady and vortical flow fields, with promising implications for energy savings in autonomous robotic swarms.

There is a long-standing interest for understanding and exploiting the physical mechanisms used by active swimmers in nature (nektons) (1–4). Fish schooling, in particular, one of the most striking patterns of collective behavior and complex decision-making in nature, has been the subject of intense investigation (5–9). A key issue in understanding fish-schooling behavior, and its potential for engineering applications (10), is the clarification of the role of the flow environment. Fish sense and navigate in complex flow fields full of mechanical energy that is distributed across multiple scales by vortices generated by obstacles and other swimming organisms (11, 12). There is evidence that their swimming behavior adapts to flow gradients (rheotaxis), and, in certain cases, it reflects energy-harvesting from such environments (13, 14). Hydrodynamic interactions have also been implicated in the fish-schooling patterns that form when individual fish adapt their motion to that of their peers, while compensating for flow-induced displacements. Recent experimental studies have argued that fish may interact beneficially with each other (9, 15, 16), but in ways that challenge (17) the earlier proposed mechanisms (5, 6) governing fish schooling. However, the role of hydrodynamics in fish schooling is not embraced universally (8, 18, 19), and there is limited quantitative information regarding the physical mechanisms that would explain such energetic benefits. Experimental (15, 16) and computational (20) studies of collective swimming have been hampered by the presence of multiple deforming bodies and their interactions with the flow field. Moreover, numerical simulations have demonstrated that a coherent swimming group cannot be sustained without exerting some form of control strategy on the swimmers (21, 22). Here, we use deep reinforcement learning [deep RL (23)] to discover such strategies for two autonomous and self-propelled swimmers and elucidate the physical mechanisms that enable efficient and sustained coordinated swimming.

During fish propulsion, body undulations and the sideways displacement of the caudal fin generate and inject a series of vortex rings in its wake (24–26). When fish swim in formation, these vortices may assist the locomotion of fish that intercept them judiciously, which in turn can reduce the collective swimming effort. Such vortex-induced benefits have been observed in trout, which curtail muscle use by capitalizing on energy injected in the flow by obstacles present in streams (13, 27). Here, we examine configurations of two and three self-propelled swimmers in a leader(s) –follower(s) arrangement and investigate the physical mechanisms that lead to energetically beneficial interactions by considering four distinct scenarios. Two of these involve smart followers that can make autonomous decisions when interacting with a leader’s wake and are referred to as interacting swimmers ( $I S$ ) (e.g., the follower in Fig. 1). Additionally, we consider two distinct solitary swimmers ( $S S$ ) that swim in isolation in an unbounded domain. In the case of interacting swimmers, ${I S}_{η}$ denotes swimmers that learn the most efficient way of swimming in the leader’s wake (without any positional constraints) and acquire a policy $π_{η}$ in the process. In turn, swimmer ${I S}_{d}$ attempts to minimize lateral deviations from the leader’s path, resulting in a locally optimal policy $π_{d}$ . These autonomous swimmers take decisions by virtue of deep RL, using visual cues from their environment (Fig. 2A). The solitary swimmers ${S S}_{η}$ and ${S S}_{d}$ execute actions identical to ${I S}_{η}$ and ${I S}_{d}$ , respectively, and serve as “control” configurations to assess how the absence of a leader’s wake impacts swimming-energetics.

Fig. 1. — Efficient coordinated swimming of two and three swimmers. (A) DNS of two swimmers, in which the leader swims steadily and the follower maintains a specified relative position such that it increases its efficiency by interacting with one row of the vortex rings shed by the leader. The flow is visualized by isosurfaces of the Q criterion (28). (B) DNS of three swimmers, where the two followers maintain specified positions that increase their efficiency by interacting with both rows of the vortex rings shed by the leader. (C) DNS of three swimmers with the follower benefitting from one row of wake vortices generated by each leader. Animations of the 3D simulations are provided in Movies S1–S3.

Fig. 2. — Learning efficient swimming strategies: Differences between 2D and 3D flow fields. (A) The smart swimmer relies on a predefined set of variables to identify its “observed state” (such as range and bearing relative to the leader that are depicted). Additional observed-state parameters are described in *Methods*. (B) Comparison of vorticity field in the wake of 2D (*Upper*) and cross-section of the 3D (*Lower*) swimmers (red, positive; blue, negative). In 2D, the leader’s wake vortices are aligned with its centerline. In contrast, in 3D flows, the wake vortices are diverging, leaving a quiescent region behind the leader. In 2D, smart followers must align with the leader’s centerline. In 3D, they must orient themselves at an angle to harness the wake vortex rings (WRs). Every half a tail-beat period, the smart follower in 2D simulations ( ${I S}_{η}$ ) autonomously selects the most appropriate action encoded in policy $π_{η}$ learned during training simulations, which allows it to maximize long-term swimming efficiency (Movie S4). The smart follower is capable of adapting to deviations in the leader’s trajectory (Movie S5), as these situations are encountered when performing random actions during training. (C) Relative horizontal displacement of the smart followers with respect to the leader, over a duration of 50 tail-beat periods starting from rest (solid blue line, ${I S}_{η}$ ; dash-dot red line, ${I S}_{d}$ ). (D) Lateral displacement of the smart followers. (E) Histogram showing the probability density function (PDF; left vertical axis) of swimmer ${I S}_{η}$ ’s preferred center-of-mass location during training. In the early stages of training (first 10,000 transitions; green bars), the swimmer does not show a strong preference for maintaining any particular separation distance. Toward the end of training (last 10,000 transitions; lilac bars), the swimmer displays a strong preference for maintaining a separation distance of either $Δ x = 1.5 L$ or $2.2 L$ . The solid black line depicts the correlation coefficient, with peaks in the black curve signifying locations where the smart follower’s head movement would be synchronized with the flow velocity in an undisturbed wake (see *SI Appendix* for relevant details). (F) Comparison of body deformation for swimmers ${I S}_{η}$ (*Upper*) and ${I S}_{d}$ (*Lower*), from $t = 27$ to $t = 29$ . Their respective trajectories are shown with the dash-dot lines, whereas the dashed gray line represents the trajectory of the leader. A quantitative comparison of body curvature for the two swimmers may be found in *SI Appendix*, Fig. S1.

Deep RL for Swimmers

RL (29) has been introduced to identify navigation policies in several model systems of vortex dipoles, soaring birds and microswimmers (30–32). These studies often rely on simplified representations of organisms interacting with their environment, which allows them to model animal locomotion with reduced physical complexity and manageable computational cost. However, the simplifying assumptions inherent in such models often do not account for feedback of the animals’ motion on the environment. High-fidelity numerical simulations, although significantly more computationally demanding, can account for such important considerations to a greater extent, for instance, by allowing flapping or swimming motions that closely mimic the interaction of real animals with their environment. This makes them invaluable for investigating concepts that may be carried over readily to bioinspired robotic applications, with minimal modification. This consideration has motivated our present study, where we expand on our earlier work (33), combining RL with direct numerical simulations (DNSs) of the Navier–Stokes (NS) equations for self-propelled autonomous swimmers. We first investigate 2D swimmers in a tandem configuration to scrutinize the strategy adopted by the RL algorithm for attaining the specified goals. Based on the observed behavior and the physical intuition we gain from examining these smart swimmers, we formulate simplified rules for implementing active control in significantly more complex 3D systems. This reverse-engineering approach allows us to determine simple and effective control rules from a data-driven perspective, without having to rely on simplistic models which may introduce errors owing to underlying assumptions.

Efficient Autonomous Swimmers

We first analyze the kinematics of swimmers ${I S}_{η}$ and ${I S}_{d}$ (Fig. 2), which were described previously, and were trained to attain specific high-level objectives via deep RL (see Methods for details). In both cases, the swimmer trails a leader representing an adult zebrafish of length L, swimming steadily at a velocity U, with tail-beat period T [Reynolds number $R e = L^{2} / (T ν) \approx 5000$ ]. After training, we observe that ${I S}_{d}$ is able to maintain its position behind the leader quite effectively ( $Δ y \approx 0$ ; Fig. 2D), in accordance to its reward ( $R_{d} = 1 - | Δ y | / L$ ). Surprisingly, ${I S}_{η}$ with a reward function proportional to swimming efficiency ( $R_{η} = η$ ), also settles close to the center of the leader’s wake (Fig. 2D and Movie S4), although it receives no reward related to its relative position. This decision to interact actively with the unsteady wake has significant energetic implications, as described later in the text. Both ${I S}_{d}$ and ${I S}_{η}$ maintain a distance of $Δ x \approx 2.2 L$ from their respective leaders (Fig. 2C). ${I S}_{η}$ shows a greater proclivity to maintain this separation and intercepts the periodically shed wake vortices just after they have been fully formed and detach from the leader’s tail. In addition to $Δ x = 2.2 L$ , there is an additional point of stability at $Δ x = 1.5$ (Fig. 2E). The difference $0.7 L$ matches the distance between vortices in the wake of the leader. In both positions, the lateral motion of the follower’s head is synchronized with the flow velocity in the leader’s wake, thus inducing minimal disturbance on the oncoming flow field. We note that a similar synchronization with the flow velocity has been observed when trout minimize muscle use by interacting with vortex columns in a cylinder’s wake (13). ${I S}_{η}$ undergoes relatively minor body deformation while maneuvering (Fig. 2F), whereas ${I S}_{d}$ executes aggressive turns involving large body curvature. Trout interacting with cylinder wakes exhibit increased body curvature (27), which is contrary to the behavior displayed by ${I S}_{η}$ . The difference may be ascribed to the widely spaced vortex columns generated by large-diameter cylinders used in the experimental study; weaving in and out of comparatively smaller vortices generated by like-sized fish encountered in a school (Fig. 2B) would entail excessive energy consumption.

We note that maintaining $Δ y = 0$ requires significant effort by ${I S}_{d}$ (SI Appendix, Fig. S2D), which is expected, as this swimmer’s reward ( $R_{d}$ ) is insensitive to energy expenditure. One of our previous studies (33) demonstrated that minimizing lateral displacement led to enhanced swimming efficiency (compared with the leader), albeit with noticeable deviation from $Δ y = 0$ . This conclusion is markedly different from our current observation and can be attributed to the use of improved learning techniques which are better able to achieve the specified goal. In the present study, recurrent neural networks augmented with “long short-term memory” cells (SI Appendix, Fig. S3) help encode time dependencies in the value function and produce far more robust smart swimmers than simpler feedforward networks (33). The performance of our deep recurrent network is compared with that of a feedforward network in SI Appendix, Fig. S4 and indicates that the deep network is better able to achieve the goal of in-line following, but at the penalty of increased energy expenditure. As a result, ${I S}_{d}$ succeeds in correcting for oscillations about $Δ y = 0$ much more effectively by undergoing severe body undulations (Fig. 2F), leading to increased costs (SI Appendix, Fig. S2). These observations confirm that following a leader indiscriminately can be disadvantageous if energetic considerations are not taken into account. Thus, it is unlikely that strict in-line swimming is used as a collective-swimming strategy in nature, and fish presumably adopt a strategy closer to that of ${I S}_{η}$ , by coordinating their motion with the wake flow. We note that patterns similar to the ones reported in this study have been observed in a recent experimental study (17). The behavior of swimmer ${I S}_{η}$ is also compared qualitatively to that of a real fish following a companion in Movie S6, and we observe that the motion of ${I S}_{η}$ resembles the swimming behavior of the live follower quite well.

Intercepting Vortices for Efficient Swimming

To determine the impact of wake-induced interactions on swimming performance, we compare energetics data for ${I S}_{η}$ and ${S S}_{η}$ in Fig. 3. The swimming efficiency of ${I S}_{η}$ is significantly higher than that of ${S S}_{η}$ (Fig. 3A), and the cost of transport (CoT), which represents energy spent for traversing a unit distance, is lower (Fig. 3B). Over a duration of 10 tail-beat periods (from $t = 20$ to $t = 30$ ; SI Appendix, Fig. S2) ${I S}_{η}$ experiences a $11 %$ increase in average speed compared with ${S S}_{η}$ , a $32 %$ increase in average swimming efficiency and a $36 %$ decrease in CoT. The benefit for ${I S}_{η}$ results from both a $29 %$ reduction in effort required for deforming its body against flow-induced forces ( $P_{D e f}$ ) and a $53 %$ increase in average thrust power ( $P_{T h r u s t}$ ). Performance differences between ${I S}_{η}$ and ${S S}_{η}$ exist solely due to the presence/absence of a preceding wake, since both swimmers undergo identical body undulations throughout the simulations. Comparing the swimming efficiency and power values of four distinct swimmers (SI Appendix, Fig. S2 and Table S1), we confirm that ${I S}_{η}$ and ${S S}_{η}$ are considerably more energetically efficient than either ${I S}_{d}$ or ${S S}_{d}$ .

The efficient swimming of ${I S}_{η}$ [e.g., point $η_{m a x} (A)$ in Fig. 3A] is attributed to the synchronized motion of its head with the lateral flow velocity generated by the wake vortices of the leader (Movie S4v). This mechanism is evidenced by the correlation curve shown in Fig. 2E and by the coalignment of velocity vectors close to the head in Fig. 4 A and B. As shown in Movie S7, ${I S}_{η}$ intercepts the oncoming vortices in a slightly skewed manner, splitting each vortex into a stronger ( $W_{1 U}$ , Fig. 4A) and a weaker fragment ( $W_{1 L}$ ). The vortices interact with the swimmer’s own boundary layer to generate “lifted vortices” ( $L_{1}$ ), which in turn generate secondary vorticity ( $S_{1}$ ) close to the body. Meanwhile, the wake and lifted vortices created during the previous half-period, $W_{2 U}$ , $W_{2 L}$ , and $L_{2}$ , have traveled downstream along the body. This sequence of events alternates periodically between the upper (right lateral) and lower (left lateral) surfaces, as seen in Movie S7. Interactions of ${I S}_{η}$ with the flow field at points $η_{m i n} (D)$ and $(E)$ in Fig. 3A are analyzed separately in SI Appendix, Figs. S5 and S6.

We observe that the swimmer’s upper surface is covered in a layer of negative vorticity (and vice versa for the lower surface) (Fig. 4 A, Upper) owing to the no-slip boundary condition. The wake or the lifted vortices weaken this distribution by generating vorticity of opposite sign (e.g., secondary vorticity visible in narrow regions between the fish surface and vortices $L_{1}$ , $W_{1 L}$ , $L_{2}$ , and $L_{3}$ ) and create high-speed areas visible as bright spots in Fig. 4 A, Lower. The resulting low-pressure region exerts a suction force on the surface of the swimmer (Fig. 4 B, Upper), which assists body undulations when the force vectors coincide with the deformation velocity (Fig. 4 B, Lower) or increases the effort required when they are counteraligned. The detailed impact of these interactions is demonstrated in Fig. 4 C–F. On the lower surface, $W_{1 L}$ generates a suction force oriented in the same direction as the deformation velocity ( $0 < s < 0.2 L$ in Fig. 4B), resulting in negative $P_{D e f}$ (Fig. 4E) and favorable $P_{T h r u s t}$ (Fig. 4F). On the upper surface, the lifted vortex $L_{1}$ increases the effort required for deforming the body (positive peak in Fig. 4C at $s = 0.2 L$ ), but is beneficial in terms of producing large positive thrust power (Fig. 4D). Moreover, as $L_{1}$ progresses along the body, it results in a prominent reduction in $P_{D e f}$ over the next half-period, similar to the negative peak produced by the lifted vortex $L_{2}$ ( $s = 0.55 L$ in Fig. 4E). The average $P_{D e f}$ on both the upper and lower surfaces is predominantly negative (i.e., beneficial), in contrast to the minimum swimming efficiency instance $η_{m i n} (D)$ , where a mostly positive $P_{D e f}$ distribution signifies substantial effort required for deforming the body (SI Appendix, Fig. S5). We observe noticeable drag on the upper surface close to $s = 0$ (Fig. 4 B, Upper and Fig. 4D), attributed to the high-pressure region forming in front of the swimmer’s head. Forces induced by $W_{1 L}$ are both beneficial and detrimental in terms of generating thrust power ( $0 < s < 0.2 L$ in Fig. 4F), whereas forces induced by $L_{2}$ primarily increase drag but assist in body deformation (Fig. 4E). The tail section ( $s = 0.8 L$ to $1 L$ ) does not contribute noticeably to either thrust or deformation power at the instant of maximum swimming efficiency.

Energy-Saving Mechanisms in Coordinated Swimming

The most discernible behavior of ${I S}_{η}$ is the synchronization of its head movement with the wake flow. However, the most prominent reduction in deformation power occurs near the midsection of the body ( $0.4 \leq s \leq 0.7$ in Fig. 4 C and E). This indicates that the technique devised by ${I S}_{η}$ is markedly different from energy-conserving mechanisms implied in theoretical (6, 34) and computational (20) work, namely, drag reduction attributed to reduced relative velocity in the flow and thrust increase owing to the “channelling effect.” In fact, the predominant energetics gain (i.e., negative $P_{D e f}$ ) occurs in areas of high relative velocity, for instance, near the high-velocity spot generated by vortex $L_{2}$ (Fig. 4). This dependence of swimming efficiency on a complex interplay between wake vortices and body deformation aligns closely with experimental findings (13, 27). We remark that the majority of the results presented here are obtained with a steadily swimming leader. However, with no additional training, ${I S}_{η}$ is able to exploit the wake of a leader executing unfamiliar maneuvers, by deliberately choosing to interact with the unsteady wake, as seen in Movies S5 and S6. The smart follower is able to respond effectively to such unfamiliar situations, since it is exposed to a variety of perturbations while taking random actions during training. This observation demonstrates the robustness of the RL algorithm to uncertainties in the environment and further establishes its suitability for use in realistic scenarios.

Having examined the behavior and physical mechanisms associated with energy savings, we now formulate and test a simple control rule that enables efficient coordinated swimming. We remark that this is a combination of RL and DNSs in a reverse-engineering context, where: (i) We use the capability of RL to discern useful patterns from a large cache of simulation data; (ii) we analyze the physical aspects of the resulting optimal strategy, to identify the behavior and mechanisms that lead to energetic benefits, and finally; (iii) we use this understanding to devise a rule-based control algorithm for sustained energy-efficient synchronized swimming, in a notably more complex 3D setting. To the best of our knowledge, there is no work available in the literature that investigates the flow physics governing interactions among multiple independent swimmers, by using high-fidelity simulations of 3D NS equations.

Given the head-synchronization tendency of the 2D smart swimmer ${I S}_{η}$ , we first identify suitable locations behind a 3D leader where the flow velocity would match a follower’s head motion (SI Appendix, Fig. S7). A feedback controller is then used to regulate the undulations of two followers to maintain these target coordinates on either branch of the diverging wake, as shown in Fig. 1B and Movie S1. We note that a fish following in-line behind the leader would not benefit in the present 3D simulations, since the region behind the leader remains quiescent owing to the diverging wake. The controlled motion yields an $11 %$ increase in average swimming efficiency for each of the followers (Fig. 5A) and a $5 %$ reduction in each of their CoT. Overall, the group experiences a $7.4 %$ increase in efficiency when compared with three isolated noninteracting swimmers. The mechanism of energy savings closely resembles that observed for the 2D swimmer; an oncoming WR (Fig. 5B) interacts with the deforming body to generate a “lifted-vortex” ring (LR; Fig. 5C). As this new ring proceeds along the length of the body, it modulates the follower’s swimming efficiency as observed in Fig. 5. Remarkably, the positioning of the lifted ring at the instants of minimum and maximum swimming efficiency resembles the corresponding positioning of lifted vortices in the 2D case; a slight dip in efficiency corresponds to lifted vortices interacting with the anterior section of the body (Fig. 5C and SI Appendix, Fig. S5), whereas an increase occurs upon their interaction with the midsection (Figs. 4 and 5D).

Fig. 5. — The 3D swimmer interacting with WRs. (A) Swimming efficiency for a 3D leader (dash-dot red line) and a follower (solid blue line) that adjusts its undulations via a proportional-integrator (PI) feedback controller to maintain a specified position in the wake. After an initial transient, the patterns visible in the efficiency curves repeat periodically with $T_{p}$ . Time instances where the follower attains its minimum and maximum swimming efficiency have been marked with an inverted red triangle and an upright green triangle, respectively. The sudden jumps at $t \approx 18.3$ and $19.3$ correspond to adjustments made by the PI controller. (B) An oncoming WR is intercepted by the head of the follower and generates a new LR (C) similar to the 2D case (Fig. 4). As this ring interacts with the deforming body, it lowers the swimming efficiency initially ( $t \approx 17.8$ ; A and C), but provides a noticeable benefit further downstream ( $t \approx 18.2$ ; A and D).

These results showcase the capability of machine learning, and deep RL in particular, for discovering effective solutions to complex physical problems with inherent spatial and temporal nonlinearities, in a completely data-driven and model-free manner. Deep RL is especially useful in scenarios where decisions must be taken adaptively in response to a dynamically evolving environment, and the best control strategy may not be evident a priori due to unpredictable time delay between actions and their effect. This necessitates the use of recurrent networks capable of encoding time dependencies, which can have a demonstrable impact on the physical outcome, as shown in SI Appendix, Fig. S4. In conclusion, we demonstrate that deep RL can produce efficient navigation algorithms for use in complex flow fields, which in turn can be used to formulate control rules that are effective in decidedly more complex settings and thus have promising implications for energy savings in autonomous robotic swarms.

Methods

We perform 2D and 3D simulations of multiple self-propelled swimmers using wavelet adapted vortex methods to discretize the velocity–vorticity form of the NS equations (in 2D) and their velocity pressure form along with the pressure-projection method (in 3D) using finite differences on a uniform computational grid. The swimmers adapt their motion using deep RL. The learning process is greatly accelerated by using recurrent neural networks with long short-term memory as a surrogate of the value function for the smart swimmer. Details regarding the simulation methods and the RL algorithm are provided in SI Appendix.

Supplementary Material

Supplementary File

Download video file^{(21.2MB, mp4)}

Supplementary File

Download video file^{(21.4MB, mp4)}

Supplementary File

Download video file^{(21.3MB, mp4)}

Supplementary File

Download video file^{(21.8MB, mp4)}

Supplementary File

Download video file^{(7.8MB, mp4)}

Supplementary File

pnas.1800923115.sapp.pdf^{(7.9MB, pdf)}

Supplementary File

Download video file^{(14.4MB, mp4)}

Supplementary File

Download video file^{(20.8MB, mp4)}

Acknowledgments

This work was supported by European Research Council Advanced Investigator Award 341117 and Swiss National Science Foundation Sinergia Award CRSII3 147675. Computational resources were provided by Swiss National Supercomputing Centre (CSCS) Project s658.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1800923115/-/DCSupplemental.

References

1.Schmidt J. Breeding places and migrations of the Eel. Nature. 1923;111:51–54. [Google Scholar]
2.Lang TG, Pryor K. Hydrodynamic performance of porpoises (Stenella attenuata) Science. 1966;152:531–533. doi: 10.1126/science.152.3721.531. [DOI] [PubMed] [Google Scholar]
3.Aleyev YG. 1977. Nekton (Springer, The Netherlands)
4.Triantafyllou MS, Weymouth GD, Miao J. Biomimetic survival hydrodynamics and flow sensing. Annu Rev Fluid Mech. 2016;48:1–24. [Google Scholar]
5.Breder CM. Vortices and fish schools. Zool Sci Contrib N Y Zool Soc. 1965;50:97–114. [Google Scholar]
6.Weihs D. Hydromechanics of fish schooling. Nature. 1973;241:290–291. [Google Scholar]
7.Shaw E. Schooling fishes: The school, a truly egalitarian form of organization in which all members of the group are alike in influence, offers substantial benefits to its participants. Am Sci. 1978;66:166–175. [Google Scholar]
8.Pavlov DS, Kasumyan AO. Patterns and mechanisms of schooling behavior in fish: A review. J Ichthyol. 2000;40(Suppl 2):S163–S231. [Google Scholar]
9.Burgerhout E, et al. Schooling reduces energy consumption in swimming male European eels, Anguilla anguilla L. J Exp Mar Biol Ecol. 2013;448:66–71. [Google Scholar]
10.Whittlesey RW, Liska S, Dabiri JO. Fish schooling as a basis for vertical axis wind turbine farm design. Bioinspir Biomim. 2010;5:035005. doi: 10.1088/1748-3182/5/3/035005. [DOI] [PubMed] [Google Scholar]
11.Chapman JW, et al. Animal orientation strategies for movement in flows. Curr Biol. 2011;21:R861–R870. doi: 10.1016/j.cub.2011.08.014. [DOI] [PubMed] [Google Scholar]
12.Montgomery JC, Baker CF, Carton AG. The lateral line can mediate rheotaxis in fish. Nature. 1997;389:960–963. [Google Scholar]
13.Liao JC, Beal DN, Lauder GV, Triantafyllou MS. Fish exploiting vortices decrease muscle activity. Science. 2003;302:1566–1569. doi: 10.1126/science.1088295. [DOI] [PubMed] [Google Scholar]
14.Oteiza P, Odstrcil I, Lauder G, Portugues R, Engert F. 2017. A novel mechanism for mechanosensory-based rheotaxis in larval zebrafish. Nature 547:445–448, and erratum (2017) 549:292.
15.Herskin J, Steffensen JF. Energy savings in sea bass swimming in a school: Measurements of tail beat frequency and oxygen consumption at different swimming speeds. J Fish Biol. 1998;53:366–376. [Google Scholar]
16.Killen SS, Marras S, Steffensen JF, McKenzie DJ. Aerobic capacity influences the spatial position of individuals within fish schools. Proc Biol Sci. 2012;279:357–364. doi: 10.1098/rspb.2011.1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ashraf I, et al. Simple phalanx pattern leads to energy saving in cohesive fish schooling. Proc Natl Acad Sci USA. 2017;114:9599–9604. doi: 10.1073/pnas.1706503114. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Pitcher TJ. Functions of shoaling behaviour in teleosts. In: Pitcher TJ, editor. The Behaviour of Teleost Fishes. Springer; Boston: 1986. pp. 294–337. [Google Scholar]
19.Lopez U, Gautrais J, Couzin ID, Theraulaz G. From behavioural analyses to models of collective motion in fish schools. Interf Focus. 2012;2:693–707. doi: 10.1098/rsfs.2012.0033. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Daghooghi M, Borazjani I. The hydrodynamic advantages of synchronized swimming in a rectangular pattern. Bioinspir Biomim. 2015;10:056018. doi: 10.1088/1748-3190/10/5/056018. [DOI] [PubMed] [Google Scholar]
21.Gazzola M, Hejazialhosseini B, Koumoutsakos P. Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmers. SIAM J Sci Comput. 2014;36:B622–B639. [Google Scholar]
22.Maertens AP, Gao A, Triantafyllou MS. Optimal undulatory swimming for a single fish-like body and for a pair of interacting swimmers. J Fluid Mech. 2017;813:301–345. [Google Scholar]
23.Mnih V, et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]
24.Müller UK, Smit J, Stamhuis EJ, Videler JJ. How the body contributes to the wake in undulatory fish swimming. J Exp Biol. 2001;204:2751–2762. doi: 10.1242/jeb.204.16.2751. [DOI] [PubMed] [Google Scholar]
25.Kern S, Koumoutsakos P. Simulations of optimized anguilliform swimming. J Exp Biol. 2006;209:4841–4857. doi: 10.1242/jeb.02526. [DOI] [PubMed] [Google Scholar]
26.Borazjani I, Sotiropoulos F. Numerical investigation of the hydrodynamics of carangiform swimming in the transitional and inertial flow regimes. J Exp Biol. 2008;211:1541–1558. doi: 10.1242/jeb.015644. [DOI] [PubMed] [Google Scholar]
27.Liao JC, Beal DN, Lauder GV, Triantafyllou MS. The Kármán gait: Novel body kinematics of rainbow trout swimming in a vortex street. J Exp Biol. 2003;206:1059–1073. doi: 10.1242/jeb.00209. [DOI] [PubMed] [Google Scholar]
28.Hunt JCR, Wray AA, Moin P. Studying Turbulence Using Numerical Simulation Databases, 2. Report CTR-S88. 1988. Eddies, streams, and convergence zones in turbulent flows; pp. 193–208. [Google Scholar]
29.Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA: 1998. [Google Scholar]
30.Gazzola M, Tchieu AA, Alexeev D, de Brauer A, Koumoutsakos P. Learning to school in the presence of hydrodynamic interactions. J Fluid Mech. 2016;789:726–749. [Google Scholar]
31.Reddy G, Celani A, Sejnowski TJ, Vergassola M. Learning to soar in turbulent environments. Proc Natl Acad Sci USA. 2016;113:E4877–E4884. doi: 10.1073/pnas.1606075113. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Colabrese S, Gustavsson K, Celani A, Biferale L. Flow navigation by smart microswimmers via reinforcement learning. Phys Rev Lett. 2017;118:158004. doi: 10.1103/PhysRevLett.118.158004. [DOI] [PubMed] [Google Scholar]
33.Novati G, et al. Synchronisation through learning for two self-propelled swimmers. Bioinspir Biomim. 2017;12:036001. doi: 10.1088/1748-3190/aa6311. [DOI] [PubMed] [Google Scholar]
34.Weihs D. In: Swimming and Flying in Nature. Wu TYT, Brokaw CJ, Brennen C, editors. Vol 2. Springer; Boston: 1975. pp. 703–718. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

Download video file^{(21.2MB, mp4)}

Supplementary File

Download video file^{(21.4MB, mp4)}

Supplementary File

Download video file^{(21.3MB, mp4)}

Supplementary File

Download video file^{(21.8MB, mp4)}

Supplementary File

Download video file^{(7.8MB, mp4)}

Supplementary File

pnas.1800923115.sapp.pdf^{(7.9MB, pdf)}

Supplementary File

Download video file^{(14.4MB, mp4)}

Supplementary File

Download video file^{(20.8MB, mp4)}

[r1] 1.Schmidt J. Breeding places and migrations of the Eel. Nature. 1923;111:51–54. [Google Scholar]

[r2] 2.Lang TG, Pryor K. Hydrodynamic performance of porpoises (Stenella attenuata) Science. 1966;152:531–533. doi: 10.1126/science.152.3721.531. [DOI] [PubMed] [Google Scholar]

[r3] 3.Aleyev YG. 1977. Nekton (Springer, The Netherlands)

[r4] 4.Triantafyllou MS, Weymouth GD, Miao J. Biomimetic survival hydrodynamics and flow sensing. Annu Rev Fluid Mech. 2016;48:1–24. [Google Scholar]

[r5] 5.Breder CM. Vortices and fish schools. Zool Sci Contrib N Y Zool Soc. 1965;50:97–114. [Google Scholar]

[r6] 6.Weihs D. Hydromechanics of fish schooling. Nature. 1973;241:290–291. [Google Scholar]

[r7] 7.Shaw E. Schooling fishes: The school, a truly egalitarian form of organization in which all members of the group are alike in influence, offers substantial benefits to its participants. Am Sci. 1978;66:166–175. [Google Scholar]

[r8] 8.Pavlov DS, Kasumyan AO. Patterns and mechanisms of schooling behavior in fish: A review. J Ichthyol. 2000;40(Suppl 2):S163–S231. [Google Scholar]

[r9] 9.Burgerhout E, et al. Schooling reduces energy consumption in swimming male European eels, Anguilla anguilla L. J Exp Mar Biol Ecol. 2013;448:66–71. [Google Scholar]

[r10] 10.Whittlesey RW, Liska S, Dabiri JO. Fish schooling as a basis for vertical axis wind turbine farm design. Bioinspir Biomim. 2010;5:035005. doi: 10.1088/1748-3182/5/3/035005. [DOI] [PubMed] [Google Scholar]

[r11] 11.Chapman JW, et al. Animal orientation strategies for movement in flows. Curr Biol. 2011;21:R861–R870. doi: 10.1016/j.cub.2011.08.014. [DOI] [PubMed] [Google Scholar]

[r12] 12.Montgomery JC, Baker CF, Carton AG. The lateral line can mediate rheotaxis in fish. Nature. 1997;389:960–963. [Google Scholar]

[r13] 13.Liao JC, Beal DN, Lauder GV, Triantafyllou MS. Fish exploiting vortices decrease muscle activity. Science. 2003;302:1566–1569. doi: 10.1126/science.1088295. [DOI] [PubMed] [Google Scholar]

[r14] 14.Oteiza P, Odstrcil I, Lauder G, Portugues R, Engert F. 2017. A novel mechanism for mechanosensory-based rheotaxis in larval zebrafish. Nature 547:445–448, and erratum (2017) 549:292.

[r15] 15.Herskin J, Steffensen JF. Energy savings in sea bass swimming in a school: Measurements of tail beat frequency and oxygen consumption at different swimming speeds. J Fish Biol. 1998;53:366–376. [Google Scholar]

[r16] 16.Killen SS, Marras S, Steffensen JF, McKenzie DJ. Aerobic capacity influences the spatial position of individuals within fish schools. Proc Biol Sci. 2012;279:357–364. doi: 10.1098/rspb.2011.1006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17] 17.Ashraf I, et al. Simple phalanx pattern leads to energy saving in cohesive fish schooling. Proc Natl Acad Sci USA. 2017;114:9599–9604. doi: 10.1073/pnas.1706503114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r18] 18.Pitcher TJ. Functions of shoaling behaviour in teleosts. In: Pitcher TJ, editor. The Behaviour of Teleost Fishes. Springer; Boston: 1986. pp. 294–337. [Google Scholar]

[r19] 19.Lopez U, Gautrais J, Couzin ID, Theraulaz G. From behavioural analyses to models of collective motion in fish schools. Interf Focus. 2012;2:693–707. doi: 10.1098/rsfs.2012.0033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Daghooghi M, Borazjani I. The hydrodynamic advantages of synchronized swimming in a rectangular pattern. Bioinspir Biomim. 2015;10:056018. doi: 10.1088/1748-3190/10/5/056018. [DOI] [PubMed] [Google Scholar]

[r21] 21.Gazzola M, Hejazialhosseini B, Koumoutsakos P. Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmers. SIAM J Sci Comput. 2014;36:B622–B639. [Google Scholar]

[r22] 22.Maertens AP, Gao A, Triantafyllou MS. Optimal undulatory swimming for a single fish-like body and for a pair of interacting swimmers. J Fluid Mech. 2017;813:301–345. [Google Scholar]

[r23] 23.Mnih V, et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]

[r24] 24.Müller UK, Smit J, Stamhuis EJ, Videler JJ. How the body contributes to the wake in undulatory fish swimming. J Exp Biol. 2001;204:2751–2762. doi: 10.1242/jeb.204.16.2751. [DOI] [PubMed] [Google Scholar]

[r25] 25.Kern S, Koumoutsakos P. Simulations of optimized anguilliform swimming. J Exp Biol. 2006;209:4841–4857. doi: 10.1242/jeb.02526. [DOI] [PubMed] [Google Scholar]

[r26] 26.Borazjani I, Sotiropoulos F. Numerical investigation of the hydrodynamics of carangiform swimming in the transitional and inertial flow regimes. J Exp Biol. 2008;211:1541–1558. doi: 10.1242/jeb.015644. [DOI] [PubMed] [Google Scholar]

[r27] 27.Liao JC, Beal DN, Lauder GV, Triantafyllou MS. The Kármán gait: Novel body kinematics of rainbow trout swimming in a vortex street. J Exp Biol. 2003;206:1059–1073. doi: 10.1242/jeb.00209. [DOI] [PubMed] [Google Scholar]

[r28] 28.Hunt JCR, Wray AA, Moin P. Studying Turbulence Using Numerical Simulation Databases, 2. Report CTR-S88. 1988. Eddies, streams, and convergence zones in turbulent flows; pp. 193–208. [Google Scholar]

[r29] 29.Sutton RS, Barto AG. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA: 1998. [Google Scholar]

[r30] 30.Gazzola M, Tchieu AA, Alexeev D, de Brauer A, Koumoutsakos P. Learning to school in the presence of hydrodynamic interactions. J Fluid Mech. 2016;789:726–749. [Google Scholar]

[r31] 31.Reddy G, Celani A, Sejnowski TJ, Vergassola M. Learning to soar in turbulent environments. Proc Natl Acad Sci USA. 2016;113:E4877–E4884. doi: 10.1073/pnas.1606075113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r32] 32.Colabrese S, Gustavsson K, Celani A, Biferale L. Flow navigation by smart microswimmers via reinforcement learning. Phys Rev Lett. 2017;118:158004. doi: 10.1103/PhysRevLett.118.158004. [DOI] [PubMed] [Google Scholar]

[r33] 33.Novati G, et al. Synchronisation through learning for two self-propelled swimmers. Bioinspir Biomim. 2017;12:036001. doi: 10.1088/1748-3190/aa6311. [DOI] [PubMed] [Google Scholar]

[r34] 34.Weihs D. In: Swimming and Flying in Nature. Wu TYT, Brokaw CJ, Brennen C, editors. Vol 2. Springer; Boston: 1975. pp. 703–718. [Google Scholar]

PERMALINK

Efficient collective swimming by harnessing vortices through deep reinforcement learning

Siddhartha Verma

Guido Novati

Petros Koumoutsakos

Significance

Abstract

Fig. 1.

Fig. 2.

Deep RL for Swimmers

Efficient Autonomous Swimmers

Intercepting Vortices for Efficient Swimming

Fig. 3.

Fig. 4.

Energy-Saving Mechanisms in Coordinated Swimming

Fig. 5.

Methods

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Efficient collective swimming by harnessing vortices through deep reinforcement learning

Siddhartha Verma

Guido Novati

Petros Koumoutsakos

Significance

Abstract

Fig. 1.

Fig. 2.

Deep RL for Swimmers

Efficient Autonomous Swimmers

Intercepting Vortices for Efficient Swimming

Fig. 3.

Fig. 4.

Energy-Saving Mechanisms in Coordinated Swimming

Fig. 5.

Methods

Supplementary Material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases