Machine learning the swarming phases from microscopic dynamics. (A) Raw data of one swarm expansion experiment, consisting of 1,500 space-time points (columns) in a 23-dimensional observation space (rows). Additional replicates are shown in SI Appendix, Fig. S11. Color bar indicates relative magnitudes scaled to [0,1]. In the case of strongly correlated observables with high normalized mutual information (marked by red brackets), only one of them is included in the machine-learning analysis. (B) The values of the 14 remaining observables (rows) were binned into five categories as indicated by the color bar, providing the input data for machine learning. (C) The 2D representation of the data in B, obtained with t-SNE; k-means clustering robustly identifies five main dynamical phases during swarm expansion across independent experiments (n = 3; SI Appendix, Figs. S13–S16 and SI Text). Phases are labeled with different colors. t-SNE coordinates highlighted as large circles for each phase correspond to experimental snapshots shown in D. (D) Typical images for the phases (SI Appendix, Movie S1) identified in C: low-density single-cell phase (SC); high-density rafting phase (R) with a high percentage of comoving cells; biofilm phase (B) characterized by long, unseparated cells; and coexistence phases that contain single cells and rafts (SC + R) or rafts and biofilm precursors (R + BP). (E) For each phase, simulations were run with the cell shape, motility, and density extracted from the particular phase as input parameters (SI Appendix, Movie S2 and SI Text). (Scale bars, 10 μm.) (F) Detailed quantitative comparisons between experiments (small circles), the particular experimental states shown in D (large circles), and simulations (squares; error bars are SDs, n = 20) yield good quantitative agreement, except for the B phase, confirming that physical effects determine the four motility-based swarming phases. (G) The emergence of the different phases in time and space during swarm expansion. Colored circles correspond to space-time coordinates of images from D.