Training all-mechanical neural networks for task learning through in situ backpropagation

Shuaifeng Li; Xiaoming Mao

doi:10.1038/s41467-024-54849-z

. 2024 Dec 9;15:10528. doi: 10.1038/s41467-024-54849-z

Training all-mechanical neural networks for task learning through in situ backpropagation

Shuaifeng Li ¹, Xiaoming Mao ^1,^✉

PMCID: PMC11628607 PMID: 39653735

Abstract

Recent advances unveiled physical neural networks as promising machine learning platforms, offering faster and more energy-efficient information processing. Compared with extensively-studied optical neural networks, the development of mechanical neural networks remains nascent and faces significant challenges, including heavy computational demands and learning with approximate gradients. Here, we introduce the mechanical analogue of in situ backpropagation to enable highly efficient training of mechanical neural networks. We theoretically prove that the exact gradient can be obtained locally, enabling learning through the immediate vicinity, and we experimentally demonstrate this backpropagation to obtain gradient with high precision. With the gradient information, we showcase the successful training of networks in simulations for behavior learning and machine learning tasks, achieving high accuracy in experiments of regression and classification. Furthermore, we present the retrainability of networks involving task-switching and damage, demonstrating the resilience. Our findings, which integrate the theory for training mechanical neural networks and experimental and numerical validations, pave the way for mechanical machine learning hardware and autonomous self-learning material systems.

Subject terms: Applied physics, Mechanical engineering

Here, authors introduce an in situ backpropagation analogue to train mechanical neural networks locally and physically, enabling efficient and exact gradient-based training. The network achieves high accuracy in behavior learning and various machine learning tasks.

Introduction

The past decades have witnessed the development of artificial intelligence at an unprecedented pace, with machine learning emerging as one of its most transformative branches. At the core of modern machine learning lies neural networks, computational models inspired by the intricate workings of the human brain^1,2. Neural networks have revolutionized various fields, ranging from image recognition to natural language processing and autonomous driving^3–5. Unlike traditional programming, where explicit instructions are provided to solve a problem, neural networks learn from data to make decisions^6,7. This learning process involves adjusting the parameters of interconnected nodes, or neurons, within the network through backpropagation to conduct gradient descent^8,9. Eventually, neural networks can uncover complex patterns and relationships in data, enabling them to generalize to unseen examples and perform tasks with remarkable accuracy.

Nevertheless, the substantial computational requirements and energy consumption associated with computer-based neural networks, especially considering the energy efficiency of conventional digital processors, present significant challenges to further development^10,11. One proposed solution lies in physical machine learning hardware platforms relying on physical processes, such as optical and mechanical neural networks, which hold promise for better energy efficiency compared to their digital counterparts^12–21. For example, optical neural networks, extensively studied, boast an energy advantage of several orders of magnitude over electronic processors utilizing digital multipliers¹⁷. Besides, the efficient and feasible training method of in situ backpropagation for optical neural networks promotes the promising physical machine learning platform and the reduction of carbon footprint^14,18,22.

Wave-matter interactions are commonly utilized in optical neural networks to implement machine learning, employing mechanisms including diffraction and the equivalence between recurrent neural networks and wave physics^14,23. Similar ideas are used to establish the learning framework in mechanical neural networks (MNNs)^21,24. Another proposed method uses a vibrating plate, activated by acoustic waves, to function as both input and output. Instead of training the neural network by adjusting weights directly, this approach employs electrically generated masking signals from interfering acoustic waves for the training process¹⁹. This idea has been expanded by adding multiple layers of vibrating plates, thus forming a deep physical neural network²⁰. MNNs are expected to show advantages in challenging environments with complex electromagnetic conditions where optical counterparts may underperform.

Despite the effective wave-based implementations, wave dynamics is complex involving multiple resonant modes in real materials systems, leading to challenges in the dynamical modeling of the physical systems, which may cause large simulation-reality gap. In contrast, the physical change induced in MNNs under static forces offer a promising solution to address these challenges, which has been demonstrated to learn behaviors (design desired response of materials under load)^25,26. However, the approach using optimization algorithms (e.g., genetic algorithms) operated on computers ultimately relies on conventional digital processors. Therefore, challenges exist in terms of efficient training methods that rely on physical processes and corresponding experimental demonstrations.

In light of this, the concept of physical learning offers a promising avenue to train MNNs only based on local information of the networks, inspired by neuroscience²⁷. This methodology is promising for experimental implementations where the learning occurs physically in the system. One method, termed coupled learning, uses the contrast of two equilibrium states (free and nudged states) to update the system and is able to solve given tasks^28–31. This method has enabled MNNs to perform well in supervised machine learning tasks. Each time a training example of a force pattern is applied to the MNN, it updates itself according to the learning rule. Over time, the MNNs learn to accurately respond to unseen forces that have spatial correlation patterns resembling those in the training examples.

Another well-known method for physical learning is Equilibrium Propagation (EP), sharing similar procedure with coupled learning and being able to define the arbitrary differentiable loss function³². This method has been demonstrated in various physical systems, numerically in nonlinear resistor networks³³ and coupled phase oscillators³⁴, experimentally on Ising machines³⁵.

So far, the MNNs based on the physical learning have been developed using the platform of origami structures^28,36 and disordered networks^29,37 to demonstrate machine learning through simulations. The experimental proposals involve using directed springs with variable stiffness³⁸ and manually adjusting the rest length of springs³¹.

Here, we present a highly-efficient training protocol for MNNs through mechanical analogue of in situ backpropagation, derived from the adjoint variable method, in which theoretically the exact gradient can be obtained from only the local information. By using 3D-printed MNNs, we demonstrate the feasibility of obtaining the gradient of the loss function experimentally solely from the bond elongation of MNNs in only two steps, using local rules, with high accuracy. Besides, leveraging the obtained gradient, we showcase the successful training in simulations of a mechanical network for behaviors learning and various machine learning tasks, achieving high accuracy in both regression and Iris flower classification tasks. The trained MNNs are then validated both numerically and experimentally. In addition, we illustrate the retrainability of MNNs after switching tasks and damage, a feature that may inspire further inquiry into more robust and resilient design of MNNs.

Beyond their applications as computational devices, these MNNs also offer unprecedented opportunities in materials science and mechanical engineering as sustainable and autonomous materials systems, because they can be trained to learn certain behaviors to adapt to different environments and tasks. In engineering, few examples exist where materials or machines possess the innate ability to exhibit desired behaviors without meticulous design and engineering. However, the design strategies need expert knowledge and experience. The MNNs as well as the efficient training protocol proposed here also pave the way for future intelligent material systems.

Results

In situ backpropagation in mechanical neural networks

We start from introducing the theoretical basis to conduct in situ backpropagation in MNNs, namely, obtaining the gradient of a loss function with respect to spring constants of MNNs. Here we have n nodes embedded in d-dimensional space, located at position {x_j}. The numbers of input and output nodes are n_in and n_out, respectively. The nodes are connected by m linear springs, each of which has a spring constant k_i. Also, zero modes are prohibited by properly designing the network connectivity such that the compatibility matrix C is fully ranked, which will be introduced later. Compared with the typical computer-based neural networks with layer-by-layer structures, MNNs are regarded as entire networks. Given the certain task, the task learning problem can be described as:

\begin{matrix} {minimize}_{k} L [u (k)], \\ subject to D u = F, \end{matrix}

where $L$ is the loss function, $k \in R_{\geq 0}^{m \times 1}$ is a vector containing the spring constant of each bond, which is the trainable learning degree of freedom, $D \in R^{d n \times d n}$ is the stiffness matrix which is symmetric (also know as the Hessian matrix or the dynamical matrix in different contexts), $u \in R^{d n \times 1}$ is the node displacement, which is the output and $F \in R^{d n \times 1}$ is the external forces applied on the nodes, which is the input. The governing equation of statics, Du = F, represents the forward problem, reflecting the response u (displacement of each node) of a MNN under input forces F. To minimize $L [u (k)]$ using gradient descent, $\nabla L$ is derived as below:

\nabla L = \frac{d L}{d k} = \frac{\partial L}{\partial u} \frac{d u}{d k} .

Given the form of the loss $L$ as a function of u, Jacobian $\frac{\partial L}{\partial u}$ can be conveniently calculated, whereas usually $\frac{d u}{d k}$ is a computationally-heavy term due to interactions between the nodes. For example, typically, the computational cost for $\frac{\partial L}{\partial u}$ is $O (d n_{out})$ , but that for $\frac{d u}{d k}$ is $O (d m n)$ . We show as below that this term can be derived from the differentiation of Du = F on both sides:

\frac{d u}{d k} = - D^{- 1} \frac{d D}{d k} u,

with $\frac{d}{d k} F = 0$ . Then plug Eq. (3) to Eq. (2):

\nabla L = \frac{\partial L}{\partial u} (- D^{- 1} \frac{d D}{d k} u) = u_{adj}^{T} \frac{d D}{d k} u .

Here we use the transpose of the adjoint displacement $u_{adj}^{T}$ to represent $- \frac{\partial L}{\partial u} D^{- 1}$ to obtain $u_{adj}^{T} D = - (\frac{\partial L}{\partial u})$ . Note that D is a symmetric matrix such that D = D^T. Therefore, after taking the transpose on both sides, the adjoint problem can be defined as below:

D u_{adj} = - {(\frac{\partial L}{\partial u})}^{T} .

Apparently, the two problems (forward and adjoint) differ in their forces. Likewise, the adjoint problem can be understood as the response u_adj of a MNN under the adjoint force $- {(\frac{\partial L}{\partial u})}^{T}$ .

Then, we utilize the compatibility matrix^39,40, $C \in R^{m \times d n}$ that maps the node displacement u to the bond elongation e in the linear regime, such that, $e_{i} = {\hat{I}}_{j_{1} j_{2}} \cdot (u_{j_{1}} - u_{j_{2}})$ , where ${\hat{I}}_{j_{1} j_{2}}$ is a unit vector pointing from node j₁ to node j₂, which determines each entry in C. Besides, the stiffness matrix D can be decomposed into C^TKC, where K is the diagonal matrix with k as the diagonal entries. The gradient of $L$ in Eq. (4) can be further expressed as:

\nabla L = u_{adj}^{T} \frac{d D}{d k} u = u_{adj}^{T} \frac{d (C^{T} K C)}{d k} u = u_{adj}^{T} C^{T} \frac{d K}{d k} C u = e_{adj} \circ e,

where ∘ is the Hadamard product (i.e., element-wise product). $\frac{d K}{d k}$ is a tensor $δ_{p q l} \in R^{m \times m \times m}$ , where the entry is 1 when p = q = l and otherwise 0. Eq. (6) implies that the gradient of the loss function $L$ equals to the element-wise multiplication of elongations of the bonds in the forward problem Du = F and the adjoint problem $D u_{adj} = - {(\frac{\partial L}{\partial u})}^{T}$ .

Therefore, to implement in situ backpropagation in MNNs and obtain the gradient of the loss function $L$ from the local information of MNNs, there are two steps: (1) Apply the input force F to the MNNs, and obtain the displacement of the nodes and forward elongation of the bonds e. (2) Calculate $- (\frac{\partial L}{\partial u})$ given the form of the loss function $L (u)$ using the displacement in step (1) with computational cost $O (d n_{out})$ in digital computers. Note that only nonzero entries in $- (\frac{\partial L}{\partial u})$ are calculated corresponding to the output nodes, e.g., $L = {(u_{p} - u_{T})}^{2}$ for letting displacement of node p, u_p, close to desired displacement u_T, the only nonzero entry for the adjoint forces will be 2(u_T − u_p). Apply the force $- {(\frac{\partial L}{\partial u})}^{T}$ to the same system to obtain the adjoint elongation of the bonds e_adj. The gradient is the element-wise multiplication of the forward elongation and the adjoint elongation with computational cost $O (m)$ where m is the number of bonds.

Similar to the energy-based learning methods, such as EP and coupled learning described in the Introduction, our method also introduces two equilibrium states induced by the forward force F and the adjoint force $- {(\frac{\partial L}{\partial u})}^{T}$ . However, in energy-based learning methods, the nudged states are slightly away from the free states controlled by the nudging strength, whereas the two equilibrium states in our method are independent, where the input forces are absent in the second state. Besides, algorithmically, the difference between two equilibrium states approximates the gradient in energy-based learning methods but our method yields gradient through the multiplication, performing backpropagation in the MNN. In the Supplementary Information, we show the link between EP and our method. Notably, in the linear regime, our method and EP produce the same results despite different procedures and algorithms.

When we further think about the form of the adjoint forces $- {(\frac{\partial L}{\partial u})}^{T}$ , where the nonzero entry is 2(u_T − u_p), it can be regarded as the error between the output and the desired output. Therefore, essentially our method provides two signal passes, a forward pass sending the input signal and a backward pass sending the error signal, reminiscent of the adjoint-method-based backpropagation in optical neural networks^14,22,41. Furthermore, recent work on training chemical signaling networks shows the similar procedures and conclusions, where the concentration drop and the pressure drop along the bond produces the gradient, derived by the special form of the system matrix⁴². Likewise, in our linear MNNs, the elongation can be regarded as the displacement drop along the bond, and thus the learning rules share the common characteristics.

More importantly, this method of in situ backpropagation stays consistent with the local rule required in physical learning³⁰, since the gradient for bond i can be obtained solely from the elongation of bond i, i.e., $\nabla L_{i} = e_{adj, i} e_{i}$ . Apart from EP and coupled learning, our method serves as an alternative option to train MNNs locally. Our learning rules apply to the linear regime strictly, indicating the utilization of small deformation of MNNs to function.

Subsequently, this gradient $\nabla L_{i}$ , obtained locally at each bonds i via two steps described above, is used to update the spring constants at learning rate α, from k_i to $k_{i} - α \nabla L_{i}$

k_{i} \leftarrow k_{i} - α \nabla L_{i} = k_{i} - α e_{adj, i} e_{i},

iteratively through gradient descent, minimizing the loss function subject to the physics law.

Here, to demonstrate the in situ backpropagation in MNNs, we fabricate two-dimensional MNNs made of flexible Agilus30 using 3D printing techniques, as shown in Fig. 1a. The detailed fabrication procedures and configurations are shown in Methods and Supplementary Information. As an example, we take the loss function to be $L = {(u_{L y} + 0.025 m)}^{2}$ under the downward applied force F_Ry = 0.01 × 9.8N, where u_Ly and F_Ry represent the vertical displacement of the node at bottom left and the applied force on the node at bottom right of MNNs. The leftmost and rightmost sides at the top of our MNN are glued on the truss as the fixed boundary condition. The force is applied by the weights through gravity.

Fig. 1 — a 3D-printed mechanical neural networks using the Polyjet rubber-like material Agilus30. b The experimental setups for the forward field and the adjoint field, and the resulting gradient of the loss function from simulation and experiments are shown from left to right, respectively. The error bars in the third panel are calculated based on standard deviation of three independent experiments. The forward field and adjoint field are obtained physically. The adjoint force and the element-wise multiplication are obtained through computer. c The experimental elongations of forward field and adjoint field are shown in the first and second panel, respectively. The experimental gradient is shown in the third panel. d The corresponding simulated elongations and gradient are shown from left to right. e The comparison between the finite difference method and our adjoint method. The black curve in the left panel shows the numerical error as a function of step size in finite difference method. The shaded area represents the experimentally feasible region with large step sizes, below which the step size Δk is too small compared to manufacturing accuracy. The numerical error of the adjoint method depends on the machine precision in the linear regime. The orange curve in the right panel represents the gradient error when the deformation is not infinitesimal and the response of the MNN is nonlinear. The experimental error by using adjoint method is also shown. Source data are provided as a Source Data file.

Figure 1 b shows the experimental setup to implement the mechanical analogue of in situ backpropagation. The first panel shows the forward field where F_Ry is applied on the bottom right node by a 10 g weight. The first panel of Fig. 1c exhibits the experimentally measured elongation of each bond, and the corresponding simulated one is shown in the first panel of Fig. 1d. The applied force results in the vertical displacement of the node on the bottom left, which is measured to be u_Ly = −0.82 mm experimentally. Therefore, the value of the nonzero entry of the force for the adjoint problem − 2(u_Ly + 0.025) is calculated to be equivalent to 5g weight hung on the bottom left node, with experimental setup displayed in the second panels of Fig. 1b. Note that the calculated force is converted to weights used in experiments by gravitational acceleration constant g = 9.8 m/s². Figure 1c, d show the measured and simulated elongation, respectively. Our method indicates that the gradient of the loss function is the element-wise multiplication of the forward elongation and the adjoint elongation, which is illustrated in the third panels of Fig. 1c, d, representing experimental and simulated results, respectively.

We observe that our measured elongation and simulated elongation have an excellent agreement, as well as the gradient. We then define the gradient error as $1 - g_{t} \cdot \hat{g_{t}}$ , where g_t and $\hat{g_{t}}$ are normalized measured gradient and exact gradient, respectively. Compared with the simulated gradient, which represents the exact gradient, our experimental gradient error is less than 0.1, as shown in the inset in Fig. 1e. These results are summarized and averaged by three independent experiments described in the main text, along with additional three independent experiments for another loss function, which is detailed in the Supplementary Information (Supplementary Fig. 1).

Although numerically the gradient in machine precision can be obtained, the assumption in the linear regime requires infinitesimal deformation, indicating that our method used in the experiment is always an approximation. To explore the validity of our method in experiments, we perform the nonlinear analysis numerically and show the gradient error as a function of the adjoint force represented by weights in Fig. 1e. When the weight is small, the adjoint field is in the linear regime, where $e_{i} = {\hat{I}}_{j_{1} j_{2}} \cdot (u_{j_{1}} - u_{j_{2}})$ , leading to the small gradient error. However, when the weight becomes large, the adjoint field deviates from the linear regime, where $e_{i} = ∥ x_{j_{1}} + u_{j_{1}} - x_{j_{2}} - u_{j_{2}} ∥ - ∥ x_{j_{1}} - x_{j_{2}} ∥$ , resulting in the gradient error from 10⁻⁴ to 10⁻³. This low gradient error at large adjoint force implies that our method can produce gradient in high precision and efficiency. Experimentally, we choose the weight to be small enough for linearity and large enough to produce measurable displacements, leading to an error estimate shown in Fig. 1b and e.

To provide a comparison with conventional numerical method, we calculate the approximate gradient using the forward difference as follows:

\nabla L = \frac{L [u (k + δ k)] - L [u (k)]}{δ k} .

The error depends on the step size δk, where the smallest error occurs at around δk = 10⁻⁶ N/m, as shown in the left panel of Fig. 1e. When the step size is smaller than the optimal one, the fixed number of binary digits in computers leads to the roundoff error, and the truncation error will emerge when using larger step size. Note that the roundoff error will also occur in our method, but thanks to the user-defined convergence condition, the minute adjoint force can be avoided. Besides, finite difference method is usually implemented numerically since δk needs to be considerably small (also see the experimentally feasible step size within the shaded area in Fig. 1e with large numerical error), whereas our method features experimental feasibility by only measuring the bond elongation. Furthermore, for the finite difference method, the number of required simulations increases linearly as the number of bonds in networks increases because each element $\partial L / \partial k_{i}$ corresponding to bond i requires a separate computation, whereas our adjoint method only needs two simulations or experiments to obtain the gradient regardless of the number of bonds in networks. It is worthwhile to mention here that automatic differentiation featured by forward differentiation is also proven to be efficient in training computer-based and optical neural networks^41,43.

In addition, another advantage of our MNNs is the potential utilization of the same node as both input and output node, owing to the force and displacement being different physical quantities that can be defined on the same node. This characteristic enables more compact design of MNNs. In the Supplementary Information and Supplementary Fig. 1, we demonstrate the in situ backpropagation when the defined loss function and the input force are in the same node. Furthermore, to check if our bar model used for simulation agrees with the actual experimental samples, as part of the supplementary analysis for simulations, we use the finite element method to analyze 3D actual experimental samples, which yields results in accordance with the predominantly used bar model (see Supplementary Information and Supplementary Fig. 2 for further details). We also conduct the EP to estimate the gradient of $L = {(u_{L y} + 0.025)}^{2}$ and compare the results with ours, as described in Supplementary Information and Supplementary Fig. 3. When the nudging strength is small in the linear regime, the results from our method and EP are identical. But with the increase of the nudging strength, the gradient error becomes large.

Behaviors learning

As mentioned in the Introduction, training MNNs to learn behaviors can reduce the effort of design strategies. Here, we show that without expert knowledge, through in situ backpropagation MNNs can learn desired behaviors. For example, for a MNN with uniform bonds without deliberate design, as shown in Fig. 2a, when the input force F = 0.005 × 9.8N is applied on the red node, the two cyan nodes will have the same vertical displacements u_Ly = u_Ry (i.e., symmetric output) due to the symmetric configuration, where subscript L and R represent the node on the left and right, respectively.

Fig. 2 — a The symmetric output under the applied force of the MNN before training. The top panel shows the configuration of mechanical networks. The bottom panel shows the simulated and experimental vertical displacements u_y of two nodes. b The loss and the absolute difference of vertical displacements of two nodes ∣Δu_y∣ as a function of iteration in the training process. c The asymmetric output under the applied force as a result of the training. The top panel shows the configuration of MNNs. The bottom panel shows the simulated and experimental vertical displacements u_y of two nodes. The blue triangles, red dots and cyan stars in the top panels of (a, c) represent the fixed nodes, the input node and the output nodes, respectively. The error bars in the bottom panels of (a, c) are calculated based on standard deviation of three independent experiments. Source data are provided as a Source Data file.

Considering two classes represented by two cyan nodes, we can use the cross-entropy loss with the normalization⁴⁴, $L = - \sum_{c = 1}^{N} y_{c} ln p_{c}$ , where N = 2, y_c and $p_{c} = \frac{∣ u_{c} ∣}{\sum_{c = 1}^{N} ∣ u_{c} ∣}$ represent the number of classes (2 for L and R), binary indicator and predicted probability of u_c, respectively, to maximize the probability of the vertical displacement of one of the nodes. For example, {y₁, y₂} = {1, 0} for class 1, where the left node has greater displacement, and vice versa. The cross-entropy loss decreases as the predicted probability p_c approaches the actual label, leading to the maximization of the probability and the difference between two absolute vertical displacements. Through in situ backpropagation, the asymmetric output can be realized, where two nodes have different vertical displacement under the same force applied on the red node. Figure 2b shows the progressive reduction in loss during training in simulations until convergence. Meanwhile, the difference of absolute vertical displacements between two nodes increases until reaching the maximum. The training process including loss decrease and bonds width change (k_i ∝ w_i) is also shown in Supplementary Video 1. During the training process, to keep the bond width in a certain range from 1.5mm to 2.5mm for the purpose of fabrication and following experiments, we use a projection scheme by a modified Sigmoid function, as detailed in Methods and Supplementary Information. This strategy also applies to the machine learning tasks in the following sections.

Figure 2c depicts the trained MNN, featuring a node on the left with a larger displacement in response to the input force. On the contrary, in Supplementary Fig. 4, another trained MNN presents an alternate scenario where the node on the right displays a greater displacement under the same input force. Notably, experimental measurements of the trained MNNs align closely with simulated displacements across all scenarios, validating the efficacy. It is noteworthy that using the cross-entropy function as the loss encourages the maximization of the displacement difference between two nodes rather than considering the designated values when using mean-squared error (MSE) as the loss. In addition, we demonstrate the precise control of displacements of two nodes under the applied force F = 0.005 × 9.8 N by using MSE, as detailed in the Supplementary Information and Supplementary Fig. 5. Both loss functions can be leveraged to conduct behaviors learning as needed.

The example demonstrated above shows the MNNs can learn different behaviors under the applied force, where the utilization of in situ backpropagation offers a straightforward methodology. This technique simplifies the design process by providing the designated input and desired output. It can be used in creating advanced mechanical systems with desired functionalities. For example, automotive engineers could design cars with improved loading capability, while roboticists could develop robots with grasp and release behaviors under the control force.

Training mechanical neural networks for regression tasks

As outlined in the Introduction, akin to their computer-based counterparts, MNNs offer an avenue for implementing machine learning tasks with improved energy efficiency. In the following, we select two representative tasks typically undertaken by computer-based neural networks to showcase the versatility and efficacy of MNNs.

Regression stands as benchmark tasks in the field of machine learning with typical examples of Abalone age prediction and wine quality model^45,46, serving as a cornerstone for evaluating model performance and predictive capabilities. Here, given that the in situ backpropagation in MNNs is conducted within the linear regime, linear regression is chosen to demonstrate in our case. Considering the stiffness of our experimental MNNs, we choose four synthetic datasets to exemplify the regression task as a function of the input force F, expressed as follows:

u_{R x} = 0 F, u_{R y} = 0.016 F, u_{L x} = 0.004 F, u_{L y} = 0.016 F .

Here, u_Rx, u_Ry are the horizontal and vertical displacement of the bottom right node, respectively. u_Lx and u_Ly are those of the bottom left node. Hence, by training with such dataset, we expect these two nodes to exhibit linear displacement under applied force according to the prescribed slopes. This regression task can also be simply visualized by two straight lines as shown in Fig. 3a, formulated as follows:

\begin{matrix} u_{R x} = 0, \\ u_{L y} = 4 u_{L x} . \end{matrix}

This can be interpreted as relations between the horizontal and vertical displacements of two nodes, where the bottom right node does not develop horizontal displacement while the bottom left node has horizontal and vertical displacements in a certain relation under the applied force. We generate a set of 100 random data points according to Eq. (9), as depicted in the left penal of Fig. 3a. In addition, the noisy data are also generated from Eq. (9) by adding the Gaussian noise with a standard deviation of 10⁻⁴, as illustrated in the right panel of Fig. 3a.

We randomly split the synthetic dataset into a training set (70%) and a testing set (30%). As displayed in Fig. 3b, the MSE losses ( $L = \frac{1}{N} \sum_{j = 1}^{N} {(u_{j} - {\hat{u}}_{j})}^{2}$ , where N = 100, u_j and ${\hat{u}}_{j}$ are predicted values from current MNNs and target values in the synthetic dataset, which are composed of horizontal and vertical displacements, respectively.) for the noise-free and noisy datasets exhibit consistent decrease in simulations over epoch until convergence. This decline indicates an improved fit between regression results and datasets. Figure 3c, from left to right, illustrates the regression results under different epoch. At the beginning of training (epoch=1), a cross, which has large discrepancy from Fig. 3a, is shown. As the epoch increase to 10, the orange line gradually becomes vertical and blue line remains tilted. Upon convergence of the loss, the regression results at epoch=5000 show the excellent agreement with regression targets. The training process including loss decrease, bonds change and relevant regression results is shown in Supplementary Video 2.

Figure 3 d exhibits the trained MNN, where the widths of bonds are different. Therefore, it is anticipated that upon applying different force to the red node, the nodes marked by the orange and blue stars will develop displacement in accordance with the solid lines depicted in Fig. 3a. Due to the applied force from the gravity of weights in our experimental setup, we test regression results under the downward force. Figure 3e shows the experimental setup under the applied force F = 0.006 × 9.8 N. The experimentally measured displacement u_x and u_y of two nodes L and R, marked by stars, under the applied force ranging from F = 0 × 9.8 N to F = 0.012 × 9.8 N with an increment ΔF = 0.002 × 9.8 N are presented in the third panel of Fig. 3c, where the good agreement with simulations can be observed. Moreover, the experimental loss is 1.99 × 10⁻⁷ and the accuracy defined by the l²-norm error $1 - \frac{∥ u - \hat{u} ∥}{∥ \hat{u} ∥}$ is 85.27% on average, respectively.

This regression task can also be interpreted as a behaviors learning task, where the trajectory of the node under the applied force can be precisely engineered using in situ backpropagation. The behaviors learning task using regression poses a greater challenge than the traditional behaviors learning task, as the desired outputs are not specific values, but instead, functions of the input force. In our specific case, while the bottom right node remains stationary horizontally under downward force, the bottom left node demonstrates both horizontal and vertical displacements in a proportionate manner. Leveraging regression to implement behaviors learning opens up avenues for designing materials functionalities.

Training mechanical neural networks for classification tasks

Another benchmark task in machine learning is classification. In our study, we utilize the well-known Iris flower dataset, a real-world dataset, to exemplify the classification process⁴⁷. This dataset aims to classify three types of Iris flower – namely, Iris setosa, Iris versicolor and Iris virginica – using four distinct features: sepal length, sepal width, petal length, and petal width. Figure 4a visually illustrates the relation between sepal length and petal length among three species. These species exhibit clear boundaries in the feature space, with Iris virginica typically characterized by larger sepal and petal lengths compared with the other species.

Fig. 4 — a The Iris flower classification dataset. The relation between sepal length and petal length is visualized. b The loss (purple) and classification accuracy (orange for training set and blue for testing set) as a function of epoch in training process. The inset shows the trained configuration of MNNs. The blue triangles and red dots represent the fixed nodes and the input nodes, respectively. The symbols used in (a) are shown in the inset of (b) to represent the output nodes for corresponding type of Iris flowers. The error bars are calculated based on standard deviation of ten training processes under randomly partitioned datasets. c The classification results when the epoch is 10, 20 and 100 are shown from left to right, respectively. The lime dots in the third panel represent the incorrect classification data. d The comparison of classification results between simulation and experiment when conducted in the MNN at epoch 100. The insets display the experimental configuration under the input force. The symbols on the top of the largest displacement correspond to the symbols in (a), representing the corresponding flower as the input. The error bars are calculated based on standard deviation of three independent experiments. Source data are provided as a Source Data file.

In our classification task, we consider four features, each corresponding to a downward input force applied on the nodes marked by red dots in the inset of Fig. 4b. Note that these four values are appropriately scaled based on the stiffness of experimental MNNs and details can be seen in Supplementary Information. The indicator of a specific species is determined by the node with the largest horizontal displacement among the three nodes in the inset marked by the corresponding symbols.

In our classification process, we employ the cross-entropy loss function and randomly partition the dataset into a training set (70%) and a testing set (30%). As the loss steadily decreases over epoch in simulations, the accuracy of the classification defined by the ratio of the correct classification for the training dataset approaches 95% on average. Meanwhile, the accuracy for the testing dataset, which is unseen during the training process, also converges to 96% on average, suggesting that our MNNs have effectively learned the complex pattern and relations in this dataset. The trained MNNs is shown in the inset with different width of the bonds. Figure 4c illustrates the classification results at different epoch. From left to right, with the increase of epoch, the Iris setosa is classified first as it is distinctly separated from the other species in the feature space. Gradually, Iris virginica emerges, eventually sharing a clear boundary with Iris versicolor in the feature space. The incorrect classification is marked by lime dots in the third panel of Fig. 4c. The Supplementary Video 3 shows the clear evolution of classification results as the training goes on. Note that the third panel of Fig. 4c exhibits excellent similarities with Fig. 4a which is the ground truth, affirming the efficacy of our classification model. The confusion matrix can be seen in the Supplementary Fig. 6, showing the classification results from MNNs are becoming closer to the ground truth over epoch.

Furthermore, we perform classification experiments to validate the model stored in the trained MNNs. The input forces corresponding to each Iris flower are rounded and converted to integer weights, which are then applied to the corresponding nodes, as shown in the insets of Fig. 4d. The details can be seen in the Supplementary Information. Figure 4d shows, from top to bottom, the normalized measured horizontal displacement using features of Iris virginica, Iris versicolor and Iris setosa, respectively. In both simulation and experiment for three tested cases, the largest horizontal displacement occurs at the corresponding node, indicating correct classifications with the experimental loss 0.5955. This consistency between simulation and experimental results reinforces the reliability of our classification model.

Note that the choice of the input nodes and output nodes is important for the successful training of MNNs under our experimental setup. The configuration shown in the inset of Fig. 4b is ideal for experimental implementation and measurement. In the Supplementary Information and Supplementary Fig. 7, we show three more choices which may not be ideal for experiments under our setup but are still able to reach high accuracy after training.

In addition, we demonstrate another classification task implemented on our MNNs, penguin classification⁴⁸, to further justify the capability, as detailed in Supplementary Information and Supplementary Fig. 8. With the decrease of the loss, the accuracies for the training set and testing set increase to 90% and 92%, respectively.

Retrainability

In the preceding sections, we have showcased the capability of our MNNs to implement the behaviors learning and various machine learning tasks through in situ backpropagation. Distinct from the computer-based neural networks, which exist solely in the digital realm, MNNs are physically manufactured, embedding the machine learning model within real materials. Hence, the retrainability of MNNs emerges as a pivotal attribute⁴⁹. Here, we highlight the retrainability in two key scenarios through simulations: first, their ability to seamlessly transition from one task to another on demand, and second, their capacity to recover the machine learning model after sustaining damage.

In the task-switching scenario, we start from the Iris flower classification task. Following training our MNN until convergence as shown in Fig. 5a, the resulting trained MNN for the classification task serves as the initial configuration for the regression task. We notice that in Fig. 5b the mean-squared error decreases and the accuracies for both the training and testing noise-free dataset approach nearly 100%, with accuracy defined as the l²-norm error. Figure 5c shows the decreasing cross-entropy loss and increasing accuracy as the task switches to classification. Upon convergence, the trained MNN differs from that depicted in Fig. 5a, suggesting the different local minima when starting from distinct initial configurations. Although the accuracies in both Fig. 5a, c eventually increase to nearly 100%, yet, the converged loss in Fig. 5a is smaller than that in Fig. 5c, indicating a more pronounced classification sign (i.e., larger horizontal displacement of the corresponding node compared with that of other two nodes in our case). The Supplementary Video 4 depicts the transition process from classification to regression and back to classification.

Another scenario involves the retrainability of the MNNs after damage occurs. We again begin with the Iris flower classification task shown in Fig. 5a. Then, we prune one of the bonds in the trained MNN displayed in Fig. 5d, effectively resulting in the breakdown of the Iris flower classification model stored in our MNNs. As shown in Fig. 5e, the classification accuracy diminishes to approximately 50%, accompanied by an increase in the cross-entropy loss, signifying the degradation of the classification model. However, after training the damaged MNNs, the classification accuracy rebounds to around 80%, indicating a substantial recovery of the classification model stored in MNNs. It is worth noting that the decrease in loss exhibits relatively large variation, suggesting that under such configuration the training process depends on the partitioning of training and testing datasets. The Supplementary Video 5 illustrates the retrainability of the MNNs following damage occurrence. Both retrainable properties rely on the tunable spring constants in MNNs, the experimental implementation of which will be discussed in the Conclusion.

While the pruning of one bond is demonstrated above and previous works⁴⁹, our Supplementary Information and Supplementary Fig. 9 delves into the effects of pruning different bonds on classification accuracy, where distinct bonds exhibit varying importance in classification tasks. Pruning “redundant” bonds sustains high accuracy levels; however, pruning “critical” bonds causes a significant accuracy decline. Moreover, the pruning of certain bonds can render the MNNs mechanically unstable, leading to the emergence of mechanical zero modes, which causes the function failure of the MNNs. The identification of “critical” and “redundant” bonds in the classification task, as demonstrated in the Supplementary Information, underscores the vulnerability and robustness of our MNNs. It stimulates the development of more resilient designs for MNNs, encompassing network topology and connectivity considerations^50,51. Furthermore, the examination of damaged MNNs may prompt reflection on their potential parallels with damaged biological neural networks in the brain, thereby inspiring further exploration into shared characteristics^52,53.

Discussion

In conclusion, our study presents a foundational method for training MNNs through in situ backpropagation, derived from the adjoint variable method. This novel approach enables the computation of gradients of the loss function from local information within MNNs in only two steps, demonstrating excellent efficiency. Leveraging in situ backpropagation, we have investigated the capabilities of MNNs in learning behaviors and various machine learning tasks, achieving high accuracy in both regression and Iris flower classification tasks. These physically-manufactured MNNs store machine learning models in real materials, distinguishing them from the computer-based neural networks. Besides, unlike fluid networks²⁹, chemical signaling networks⁴² and electric networks⁴⁹, where both input and output are scalar quantities, our mechanical networks feature the vector input and output. This characteristic allows for more opportunities in higher dimensions.

Moreover, our work highlights the retrainability of MNNs, a crucial attribute with profound implications for real-world applications. We have demonstrated that these networks can seamlessly switch between tasks and recover from damage, showcasing their robustness and resilience. The utilization of static force to implement task learning addresses some challenges in physical neural networks based on wave dynamics, such as energy dissipation and loss. Also, MNNs provide fairly fast information processing at the speed of sound of materials in both in situ backpropagation at the training stage and decision-making at the prediction stage.

It is important to note that the experimental feasibility of in situ backpropagation has been demonstrated as shown in Fig. 1, where the error signal is backpropagated to each bond to obtain the gradient. At present, the learning process (i.e., the step of updating the spring constants according to the obtained gradients) does not involve the real MNNs because the MNNs have not been physically realized to update themselves. Instead, the update of spring constants is conducted numerically by the bar model (Eq. (7)), but the capabilities of the trained MNNs are validated through experiments. There exist numerous experimental avenues for implementing the update of the spring constant based on the in situ backpropagation we demonstrated here, so that the entire learning process can be implemented experimentally. For example, platforms such as tunable bars²⁵, and principles such as magnetoactivity⁵⁴, phase changing⁵⁵, and phototunability⁵⁶, where material properties can be programmable in situ by external fields, hold promise for facilitating further experimentation with in situ backpropagation.

Another important issue that affects the physical implementation of MNNs is the gap between the simulated model and the real materials system, including the manufacturing tolerances and the bias in parameter updates during training caused by the discrepancy on the gradient²⁰. One proposed scheme is via local pruning rules that allow the manipulation of material response by pruning bonds of disordered networks in situ and apply to more complex networks than the linear spring networks⁵⁷. This approach opens up ways to build MNNs with desired behaviors and machine learning functions, without relying on simulation models on computers, overcoming the simulation-reality gap.

Our method opens the possibility to build a pure experimental setup of trainable MNNs without a central computer. This can be done using a general programmable device that tunes the spring constant using a microcontroller for experimental realizations without computer simulations. According to our learning rule, input force is applied to the device, and node displacements and elongation (e) are read by sensors. The microcontroller calculates the Jacobian of the loss from the output node displacements and applies an equivalent adjoint force to the device, with (e_adj) read by sensors. The microcontroller then performs element-wise multiplication of e and e_adj, conducting the update of the spring constant based on the calculated gradient. As discussed above, these calculations are of very low time complexity and feasible for microcontrollers without a central computer. This pure experiment setup can also avoid simulation-reality gap and realize retrainability.

Besides, our current in situ backpropagation method is constrained within the linear regime of MNNs, leading to applications primarily focused on linear regression and linear classifiers. Given the derivation of our learning rules based on the matrix decomposition and linear operation, the nonlinear case cannot be naturally generalized. Therefore, exploring the nonlinear regime of MNNs by using nonlinear materials and geometric nonlinearity presents an opportunity not only to theoretical interest, but also to tackle nonlinear datasets and tasks, which is worth exploration in the future study.

So far, backpropagation has been the most efficient and widely-used neural network training algorithm for machine learning across digital and optical processors^1,9,20,41,58. Our demonstration of this ubiquitous technique in mechanical systems as a physical implementation unveils the promising capabilities of MNNs to reduce the cost of machine learning. The successful implementation of various tasks using MNNs has wide-ranging implications, bridging mechanics and machine learning, and paving the way for designing autonomous robots and smart materials with self-learning capabilities, which can not only respond to external stimuli but also possess the ability to learn and adapt to environments.

Methods

Sample fabrication

The trained MNN is fabricated using a Polyjet 3D printer (J850^TM Digital Anatomy^TM) with Agilus30 (black and flexible). To align with the bar model and prevent the bond-bending force, the joints of bonds in MNNs are manufactured thinner (a half of the bond width). This design ensures that the bonds prefer to deform near the nodes in the loading condition, mitigating the risk of buckling in their middle sections.

Experimental setup and measurements

We use an assembled structure to suspend the MNN by gluing the two nodes (marked by blue triangles in figures) onto the truss. Thin strings are delicately wound around the joints of the input nodes, serving as hooks to hang weights. Following a careful and slow application of weights onto the strings, experimental procedures involve taking images using a DSLR camera. The parameters for the camera are: F5.0 and ISO800. Positioned on a tripod, the camera is remotely controlled to minimize interference, ensuring precise measurements. The lens is aligned perpendicular to and at the same height as the sample to maintain accuracy. Besides, camera calibration is performed using a standard checkerboard pattern, and images are corrected using the camera matrix. To obtain the elongation of the bonds in MNNs, we employ correlation-based algorithms^59–61, to track the centers of the joints with the computational cost $O ((M - P + 1) (N - Q + 1) P Q)$ , where the size of the image subset is P × Q and the size of the search area is M × N. Then, bond elongation is calculated by determining the difference between the length under applied force and the original length. More details can be seen in Supplementary Information.

Numerical simulations

The simulations of the response of MNNs to applied forces are conducted using bar elements. To verify the feasibility of these simulations in experiments, the finite element analysis of the actual 3D model is also performed on Comsol Multiphysics (see Supplementary Information). Training of the MNNs is conducted by the in situ backpropagation derived from the adjoint method, as detailed in the main text. The gradient information obtained from this process is utilized to update the spring constant, represented by the width of the bonds, using the Adam optimization algorithm. This process iterates until convergence, achieving a final structure. The initial configurations for the behaviors learning, regression and classification entail MNNs with each bond being 2 mm. For experimental purposes, the width of each bond is restricted to range from 1.5 mm to 2.5 mm through the projection scheme enabled by a modified Sigmoid function in the simulation. The learning rate α for the behaviors learning, regression and classification demonstrated in the main text are 0.005, 0.1, and 0.006, respectively. The decay rate for momentum β₁ and the decay rate for squared gradients β₂ are kept to be 0.9 and 0.999, respectively. More details can be seen in Supplementary Information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Supplementary Information^{(3.9MB, pdf)}

41467_2024_54849_MOESM2_ESM.pdf^{(47.5KB, pdf)}

Description of Additional Supplementary Files

Supplementary video 1^{(2.5MB, mov)}

Supplementary Video 2^{(1.2MB, mov)}

Supplementary Video 3^{(1MB, mov)}

Supplementary Video 4^{(2.7MB, mov)}

Supplementary Video 5^{(1.8MB, mov)}

Reporting Summary^{(223.4KB, pdf)}

Transparent Peer Review file^{(4.6MB, pdf)}

Source data

Source Data^{(594.7KB, xlsx)}

Acknowledgements

The authors thank the support from the Office of Naval Research (MURI N00014-20-1-2479) and National Science Foundation Center for Complex Particle Systems (Award #2243104). We are grateful for the fruitful discussions with Xiongye Xiao and Prof. Paul Bogdan at University of Southern California. We also would like to thank Andy Poli in the Department of Mechanical Engineering at the University of Michigan for advise in the fabrication of 3D-printed MNNs.

Author contributions

S.L. and X.M. designed the project. S.L. conducted theoretical derivation, numerical simulations and experiments. S.L. and X.M. wrote and improved the manuscript.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Data availability

Source data are provided with this paper and the data generated in this study have been deposited in the GitHub repository: https://github.com/mao-research-group/Mechanical-neural-networks Source data are provided with this paper.

Code availability

The codes for demonstration can be found in the following link: https://github.com/mao-research-group/Mechanical-neural-networkswith DOI⁶².

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-024-54849-z.

References

1.LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444 (2015). [DOI] [PubMed] [Google Scholar]
2.Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science349, 255–260 (2015). [DOI] [PubMed] [Google Scholar]
3.Egmont-Petersen, M., Ridder, D. & Handels, H. Image processing with neural networks—a review. Pattern Recognit.35, 2279–2301 (2002). [Google Scholar]
4.Adamopoulou, E. & Moussiades, L. Chatbots: History, technology, and applications. Mach. Learn. Appl.2, 100006 (2020). [Google Scholar]
5.Turay, T. & Vladimirova, T. Toward performing image classification and object detection with convolutional neural networks in autonomous driving systems: A survey. IEEE Access10, 14076–14119 (2022). [Google Scholar]
6.Abiodun, O. I. et al. Comprehensive review of artificial neural network applications to pattern recognition. IEEE Access7, 158820–158846 (2019). [Google Scholar]
7.Carleo, G. et al. Machine learning and the physical sciences. Rev. Mod. Phys.91, 045002 (2019). [Google Scholar]
8.Amari, S.-i Backpropagation and stochastic gradient descent method. Neurocomputing5, 185–196 (1993). [Google Scholar]
9.Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J. & Hinton, G. Backpropagation and the brain. Nat. Rev. Neurosci.21, 335–346 (2020). [DOI] [PubMed] [Google Scholar]
10.Thompson, N. C., Greenewald, K., Lee, K. & Manso, G. F. The computational limits of deep learning. arXiv https://arxiv.org/abs/2007.05558 (2022)
11.Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE105, 2295–2329 (2017). [Google Scholar]
12.Hamerly, R., Bernstein, L., Sludds, A., Soljačić, M. & Englund, D. Large-scale optical neural networks based on photoelectric multiplication. Phys. Rev. X9, 021032 (2019). [Google Scholar]
13.Caulfield, H. J. & Dolev, S. Why future supercomputing requires optics. Nat. Photonics4, 261–263 (2010). [Google Scholar]
14.Hughes, T. W., Williamson, I. A., Minkov, M. & Fan, S. Wave physics as an analog recurrent neural network. Sci. Adv.5, 6946 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Wetzstein, G. et al. Inference in artificial intelligence with deep optics and photonics. Nature588, 39–47 (2020). [DOI] [PubMed] [Google Scholar]
16.Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics15, 102–114 (2021). [Google Scholar]
17.Wang, T. et al. An optical neural network using less than 1 photon per multiplication. Nat. Commun.13, 123 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Pai, S. et al. Experimentally realized in situ backpropagation for deep learning in photonic neural networks. Science380, 398–404 (2023). [DOI] [PubMed] [Google Scholar]
19.Hermans, M., Burm, M., Van Vaerenbergh, T., Dambre, J. & Bienstman, P. Trainable hardware for dynamical computing using error backpropagation through physical media. Nat. Commun.6, 6729 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Wright, L. G. et al. Deep physical neural networks trained with backpropagation. Nature601, 549–555 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Jiang, T., Li, T., Huang, H., Peng, Z.-K. & He, Q. Metamaterial-based analog recurrent neural network toward machine intelligence. Phys. Rev. Appl.19, 064065 (2023). [Google Scholar]
22.Hughes, T. W., Williamson, I. A., Minkov, M. & Fan, S. Forward-mode differentiation of Maxwell’s equations. ACS Photonics6, 3010–3016 (2019). [Google Scholar]
23.Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science361, 1004–1008 (2018). [DOI] [PubMed] [Google Scholar]
24.Weng, J. et al. Meta-neural-network for real-time and passive deep-learning-based object recognition. Nat. Commun.11, 6309 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Lee, R. H., Mulder, E. A. & Hopkins, J. B. Mechanical neural networks: Architected materials that learn behaviors. Sci. Robot.7, 7278 (2022). [DOI] [PubMed] [Google Scholar]
26.Hopkins, J. B., Lee, R. H. & Sainaghi, P. Using binary-stiffness beams within mechanical neural-network metamaterials to learn. Smart Mater. Struct.32, 035015 (2023). [Google Scholar]
27.Hopfield, J. J. Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl Acad. Sci.81, 3088–3092 (1984). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Stern, M., Arinze, C., Perez, L., Palmer, S. E. & Murugan, A. Supervised learning through physical changes in a mechanical system. Proc. Natl Acad. Sci.117, 14843–14850 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Stern, M., Hexner, D., Rocks, J. W. & Liu, A. J. Supervised learning in physical networks: From machine learning to learning machines. Phys. Rev. X11, 021045 (2021). [Google Scholar]
30.Stern, M. & Murugan, A. Learning without neurons in physical systems. Annu. Rev. Condens. Matter Phys.14, 417–441 (2023). [Google Scholar]
31.Altman, L. E., Stern, M., Liu, A. J. & Durian, D. J. Experimental demonstration of coupled learning in elastic networks. Phys. Rev. Appl.22, 024053 (2024). [Google Scholar]
32.Scellier, B. & Bengio, Y. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci.11, 24 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Kendall, J., Pantone, R., Manickavasagam, K., Bengio, Y., Scellier, B. Training End-to-End Analog Neural Networks with Equilibrium Propagation (2020). https://arxiv.org/abs/2006.01981
34.Wang, Q., Wanjura, C.C., Marquardt, F. Training Coupled Phase Oscillators as a Neuromorphic Platform using Equilibrium Propagation (2024). https://arxiv.org/abs/2402.08579
35.Laydevant, J., Marković, D. & Grollier, J. Training an ising machine with equilibrium propagation. Nat. Commun.15, 3671 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Arinze, C., Stern, M., Nagel, S. R. & Murugan, A. Learning to self-fold at a bifurcation. Phys. Rev. E107, 025001 (2023). [DOI] [PubMed] [Google Scholar]
37.Stern, M., Liu, A. J. & Balasubramanian, V. Physical effects of learning. Phys. Rev. E109, 024311 (2024). [DOI] [PubMed] [Google Scholar]
38.Patil, V.P., Ho, I., Prakash, M. Self-learning mechanical circuits (2023). https://arxiv.org/abs/2304.08711
39.Pellegrino, S. & Calladine, C. R. Matrix analysis of statically and kinematically indeterminate frameworks. Int. J. Solids Struct.22, 409–428 (1986). [Google Scholar]
40.Sun, K., Souslov, A., Mao, X. & Lubensky, T. C. Surface phonons, elastic response, and conformal invariance in twisted kagome lattices. Proc. Natl Acad. Sci.109, 12369–12374 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Hughes, T. W., Minkov, M., Shi, Y. & Fan, S. Training of photonic neural networks through in situ backpropagation and gradient measurement. Optica5, 864–871 (2018). [Google Scholar]
42.Anisetti, V. R., Scellier, B. & Schwarz, J. M. Learning by non-interfering feedback chemical signaling in physical networks. Phys. Rev. Res.5, 023024 (2023). [Google Scholar]
43.Baydin, A. G., Pearlmutter, B. A., Radul, A. A. & Siskind, J. M. Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res.18, 1–43 (2018). [Google Scholar]
44.Good, I. Some terminology and notation in information theory. Proc. IEE-Part C: Monogr.103, 200–204 (1956). [Google Scholar]
45.Nash, W., Sellers, T., Talbot, S., Cawthorn, A., Ford, W. Abalone. UCI Machine Learning Repository. 10.24432/C55C7W (1995).
46.Cortez, P., Cerdeira, A., Almeida, F., Matos, T. & Reis, J. Modeling wine preferences by data mining from physicochemical properties. Decis. support Syst.47, 547–553 (2009). [Google Scholar]
47.Fisher, R.A. Iris. UCI Machine Learning Repository. 10.24432/C56C76 (1988).
48.Gorman, K. B., Williams, T. D. & Fraser, W. R. Ecological sexual dimorphism and environmental variability within a community of antarctic penguins (genus pygoscelis). PloS one9, 90081 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Dillavou, S., Stern, M., Liu, A. J. & Durian, D. J. Demonstration of decentralized physics-driven learning. Phys. Rev. Appl.18, 014040 (2022). [DOI] [PubMed] [Google Scholar]
50.Beygelzimer, A., Grinstein, G., Linsker, R. & Rish, I. Improving network robustness by edge modification. Phys. A: Stat. Mech. Appl.357, 593–612 (2005). [Google Scholar]
51.Dekker, A.H. & Colbert, B.D. Network robustness and graph topology. In: Proceedings of the 27th Australasian Conference on Computer science-Volume 26, pp. 359–368 (2004).
52.Kalampokis, A., Kotsavasiloglou, C., Argyrakis, P. & Baloyannis, S. Robustness in biological neural networks. Phys. A: Stat. Mech. Appl.317, 581–590 (2003). [Google Scholar]
53.Eluyode, O. & Akomolafe, D. T. Comparative study of biological and artificial neural networks. Eur. J. Appl. Eng. Sci. Res.2, 36–46 (2013). [Google Scholar]
54.Zhang, W. et al. Magnetoactive microlattice metamaterials with highly tunable stiffness and fast response rate. NPG Asia Mater.15, 45 (2023). [Google Scholar]
55.Poon, R. & Hopkins, J. B. Phase-changing metamaterial capable of variable stiffness and shape morphing. Adv. Eng. Mater.21, 1900802 (2019). [Google Scholar]
56.Stowers, R. S., Allen, S. C. & Suggs, L. J. Dynamic phototuning of 3d hydrogel stiffness. Proc. Natl Acad. Sci.112, 1953–1958 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Pashine, N. Local rules for fabricating allosteric networks. Phys. Rev. Mater.5, 065607 (2021). [Google Scholar]
58.Poggio, T., Banburski, A. & Liao, Q. Theoretical issues in deep networks. Proc. Natl Acad. Sci.117, 30039–30045 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Hedrick, T. L. Software techniques for two-and three-dimensional kinematic measurements of biological and biomimetic systems. Bioinspiration Biomim.3, 034001 (2008). [DOI] [PubMed] [Google Scholar]
60.Li, S., Roger, L. M., Klein-Seetharaman, J., Lewinski, N. A. & Yang, J. Spatiotemporal dynamics of coral polyps on a fluidic platform. Phys. Rev. Appl.18, 024078 (2022). [Google Scholar]
61.Li, S. et al. Data-driven discovery of spatiotemporal coherent patterns in pulsating soft coral tentacle motion with dynamic mode decomposition. Phys. Rev. Res.5, 013175 (2023). [Google Scholar]
62.Li, S., Mao, X. Training all-mechanical neural networks for task learning through in situ backpropagation. GitHub. 10.5281/zenodo.14019746 (2024). [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(3.9MB, pdf)}

41467_2024_54849_MOESM2_ESM.pdf^{(47.5KB, pdf)}

Description of Additional Supplementary Files

Supplementary video 1^{(2.5MB, mov)}

Supplementary Video 2^{(1.2MB, mov)}

Supplementary Video 3^{(1MB, mov)}

Supplementary Video 4^{(2.7MB, mov)}

Supplementary Video 5^{(1.8MB, mov)}

Reporting Summary^{(223.4KB, pdf)}

Transparent Peer Review file^{(4.6MB, pdf)}

Source Data^{(594.7KB, xlsx)}

Data Availability Statement

The codes for demonstration can be found in the following link: https://github.com/mao-research-group/Mechanical-neural-networkswith DOI⁶².

[CR1] 1.LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444 (2015). [DOI] [PubMed] [Google Scholar]

[CR2] 2.Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science349, 255–260 (2015). [DOI] [PubMed] [Google Scholar]

[CR3] 3.Egmont-Petersen, M., Ridder, D. & Handels, H. Image processing with neural networks—a review. Pattern Recognit.35, 2279–2301 (2002). [Google Scholar]

[CR4] 4.Adamopoulou, E. & Moussiades, L. Chatbots: History, technology, and applications. Mach. Learn. Appl.2, 100006 (2020). [Google Scholar]

[CR5] 5.Turay, T. & Vladimirova, T. Toward performing image classification and object detection with convolutional neural networks in autonomous driving systems: A survey. IEEE Access10, 14076–14119 (2022). [Google Scholar]

[CR6] 6.Abiodun, O. I. et al. Comprehensive review of artificial neural network applications to pattern recognition. IEEE Access7, 158820–158846 (2019). [Google Scholar]

[CR7] 7.Carleo, G. et al. Machine learning and the physical sciences. Rev. Mod. Phys.91, 045002 (2019). [Google Scholar]

[CR8] 8.Amari, S.-i Backpropagation and stochastic gradient descent method. Neurocomputing5, 185–196 (1993). [Google Scholar]

[CR9] 9.Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J. & Hinton, G. Backpropagation and the brain. Nat. Rev. Neurosci.21, 335–346 (2020). [DOI] [PubMed] [Google Scholar]

[CR10] 10.Thompson, N. C., Greenewald, K., Lee, K. & Manso, G. F. The computational limits of deep learning. arXiv https://arxiv.org/abs/2007.05558 (2022)

[CR11] 11.Sze, V., Chen, Y.-H., Yang, T.-J. & Emer, J. S. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE105, 2295–2329 (2017). [Google Scholar]

[CR12] 12.Hamerly, R., Bernstein, L., Sludds, A., Soljačić, M. & Englund, D. Large-scale optical neural networks based on photoelectric multiplication. Phys. Rev. X9, 021032 (2019). [Google Scholar]

[CR13] 13.Caulfield, H. J. & Dolev, S. Why future supercomputing requires optics. Nat. Photonics4, 261–263 (2010). [Google Scholar]

[CR14] 14.Hughes, T. W., Williamson, I. A., Minkov, M. & Fan, S. Wave physics as an analog recurrent neural network. Sci. Adv.5, 6946 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Wetzstein, G. et al. Inference in artificial intelligence with deep optics and photonics. Nature588, 39–47 (2020). [DOI] [PubMed] [Google Scholar]

[CR16] 16.Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photonics15, 102–114 (2021). [Google Scholar]

[CR17] 17.Wang, T. et al. An optical neural network using less than 1 photon per multiplication. Nat. Commun.13, 123 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Pai, S. et al. Experimentally realized in situ backpropagation for deep learning in photonic neural networks. Science380, 398–404 (2023). [DOI] [PubMed] [Google Scholar]

[CR19] 19.Hermans, M., Burm, M., Van Vaerenbergh, T., Dambre, J. & Bienstman, P. Trainable hardware for dynamical computing using error backpropagation through physical media. Nat. Commun.6, 6729 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Wright, L. G. et al. Deep physical neural networks trained with backpropagation. Nature601, 549–555 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Jiang, T., Li, T., Huang, H., Peng, Z.-K. & He, Q. Metamaterial-based analog recurrent neural network toward machine intelligence. Phys. Rev. Appl.19, 064065 (2023). [Google Scholar]

[CR22] 22.Hughes, T. W., Williamson, I. A., Minkov, M. & Fan, S. Forward-mode differentiation of Maxwell’s equations. ACS Photonics6, 3010–3016 (2019). [Google Scholar]

[CR23] 23.Lin, X. et al. All-optical machine learning using diffractive deep neural networks. Science361, 1004–1008 (2018). [DOI] [PubMed] [Google Scholar]

[CR24] 24.Weng, J. et al. Meta-neural-network for real-time and passive deep-learning-based object recognition. Nat. Commun.11, 6309 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Lee, R. H., Mulder, E. A. & Hopkins, J. B. Mechanical neural networks: Architected materials that learn behaviors. Sci. Robot.7, 7278 (2022). [DOI] [PubMed] [Google Scholar]

[CR26] 26.Hopkins, J. B., Lee, R. H. & Sainaghi, P. Using binary-stiffness beams within mechanical neural-network metamaterials to learn. Smart Mater. Struct.32, 035015 (2023). [Google Scholar]

[CR27] 27.Hopfield, J. J. Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl Acad. Sci.81, 3088–3092 (1984). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Stern, M., Arinze, C., Perez, L., Palmer, S. E. & Murugan, A. Supervised learning through physical changes in a mechanical system. Proc. Natl Acad. Sci.117, 14843–14850 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Stern, M., Hexner, D., Rocks, J. W. & Liu, A. J. Supervised learning in physical networks: From machine learning to learning machines. Phys. Rev. X11, 021045 (2021). [Google Scholar]

[CR30] 30.Stern, M. & Murugan, A. Learning without neurons in physical systems. Annu. Rev. Condens. Matter Phys.14, 417–441 (2023). [Google Scholar]

[CR31] 31.Altman, L. E., Stern, M., Liu, A. J. & Durian, D. J. Experimental demonstration of coupled learning in elastic networks. Phys. Rev. Appl.22, 024053 (2024). [Google Scholar]

[CR32] 32.Scellier, B. & Bengio, Y. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci.11, 24 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Kendall, J., Pantone, R., Manickavasagam, K., Bengio, Y., Scellier, B. Training End-to-End Analog Neural Networks with Equilibrium Propagation (2020). https://arxiv.org/abs/2006.01981

[CR34] 34.Wang, Q., Wanjura, C.C., Marquardt, F. Training Coupled Phase Oscillators as a Neuromorphic Platform using Equilibrium Propagation (2024). https://arxiv.org/abs/2402.08579

[CR35] 35.Laydevant, J., Marković, D. & Grollier, J. Training an ising machine with equilibrium propagation. Nat. Commun.15, 3671 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Arinze, C., Stern, M., Nagel, S. R. & Murugan, A. Learning to self-fold at a bifurcation. Phys. Rev. E107, 025001 (2023). [DOI] [PubMed] [Google Scholar]

[CR37] 37.Stern, M., Liu, A. J. & Balasubramanian, V. Physical effects of learning. Phys. Rev. E109, 024311 (2024). [DOI] [PubMed] [Google Scholar]

[CR38] 38.Patil, V.P., Ho, I., Prakash, M. Self-learning mechanical circuits (2023). https://arxiv.org/abs/2304.08711

[CR39] 39.Pellegrino, S. & Calladine, C. R. Matrix analysis of statically and kinematically indeterminate frameworks. Int. J. Solids Struct.22, 409–428 (1986). [Google Scholar]

[CR40] 40.Sun, K., Souslov, A., Mao, X. & Lubensky, T. C. Surface phonons, elastic response, and conformal invariance in twisted kagome lattices. Proc. Natl Acad. Sci.109, 12369–12374 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Hughes, T. W., Minkov, M., Shi, Y. & Fan, S. Training of photonic neural networks through in situ backpropagation and gradient measurement. Optica5, 864–871 (2018). [Google Scholar]

[CR42] 42.Anisetti, V. R., Scellier, B. & Schwarz, J. M. Learning by non-interfering feedback chemical signaling in physical networks. Phys. Rev. Res.5, 023024 (2023). [Google Scholar]

[CR43] 43.Baydin, A. G., Pearlmutter, B. A., Radul, A. A. & Siskind, J. M. Automatic differentiation in machine learning: a survey. J. Mach. Learn. Res.18, 1–43 (2018). [Google Scholar]

[CR44] 44.Good, I. Some terminology and notation in information theory. Proc. IEE-Part C: Monogr.103, 200–204 (1956). [Google Scholar]

[CR45] 45.Nash, W., Sellers, T., Talbot, S., Cawthorn, A., Ford, W. Abalone. UCI Machine Learning Repository. 10.24432/C55C7W (1995).

[CR46] 46.Cortez, P., Cerdeira, A., Almeida, F., Matos, T. & Reis, J. Modeling wine preferences by data mining from physicochemical properties. Decis. support Syst.47, 547–553 (2009). [Google Scholar]

[CR47] 47.Fisher, R.A. Iris. UCI Machine Learning Repository. 10.24432/C56C76 (1988).

[CR48] 48.Gorman, K. B., Williams, T. D. & Fraser, W. R. Ecological sexual dimorphism and environmental variability within a community of antarctic penguins (genus pygoscelis). PloS one9, 90081 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Dillavou, S., Stern, M., Liu, A. J. & Durian, D. J. Demonstration of decentralized physics-driven learning. Phys. Rev. Appl.18, 014040 (2022). [DOI] [PubMed] [Google Scholar]

[CR50] 50.Beygelzimer, A., Grinstein, G., Linsker, R. & Rish, I. Improving network robustness by edge modification. Phys. A: Stat. Mech. Appl.357, 593–612 (2005). [Google Scholar]

[CR51] 51.Dekker, A.H. & Colbert, B.D. Network robustness and graph topology. In: Proceedings of the 27th Australasian Conference on Computer science-Volume 26, pp. 359–368 (2004).

[CR52] 52.Kalampokis, A., Kotsavasiloglou, C., Argyrakis, P. & Baloyannis, S. Robustness in biological neural networks. Phys. A: Stat. Mech. Appl.317, 581–590 (2003). [Google Scholar]

[CR53] 53.Eluyode, O. & Akomolafe, D. T. Comparative study of biological and artificial neural networks. Eur. J. Appl. Eng. Sci. Res.2, 36–46 (2013). [Google Scholar]

[CR54] 54.Zhang, W. et al. Magnetoactive microlattice metamaterials with highly tunable stiffness and fast response rate. NPG Asia Mater.15, 45 (2023). [Google Scholar]

[CR55] 55.Poon, R. & Hopkins, J. B. Phase-changing metamaterial capable of variable stiffness and shape morphing. Adv. Eng. Mater.21, 1900802 (2019). [Google Scholar]

[CR56] 56.Stowers, R. S., Allen, S. C. & Suggs, L. J. Dynamic phototuning of 3d hydrogel stiffness. Proc. Natl Acad. Sci.112, 1953–1958 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Pashine, N. Local rules for fabricating allosteric networks. Phys. Rev. Mater.5, 065607 (2021). [Google Scholar]

[CR58] 58.Poggio, T., Banburski, A. & Liao, Q. Theoretical issues in deep networks. Proc. Natl Acad. Sci.117, 30039–30045 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR59] 59.Hedrick, T. L. Software techniques for two-and three-dimensional kinematic measurements of biological and biomimetic systems. Bioinspiration Biomim.3, 034001 (2008). [DOI] [PubMed] [Google Scholar]

[CR60] 60.Li, S., Roger, L. M., Klein-Seetharaman, J., Lewinski, N. A. & Yang, J. Spatiotemporal dynamics of coral polyps on a fluidic platform. Phys. Rev. Appl.18, 024078 (2022). [Google Scholar]

[CR61] 61.Li, S. et al. Data-driven discovery of spatiotemporal coherent patterns in pulsating soft coral tentacle motion with dynamic mode decomposition. Phys. Rev. Res.5, 013175 (2023). [Google Scholar]

[CR62] 62.Li, S., Mao, X. Training all-mechanical neural networks for task learning through in situ backpropagation. GitHub. 10.5281/zenodo.14019746 (2024). [DOI] [PMC free article] [PubMed]

PERMALINK

Training all-mechanical neural networks for task learning through in situ backpropagation

Shuaifeng Li

Xiaoming Mao

Abstract

Introduction

Results

In situ backpropagation in mechanical neural networks

Fig. 1. Experimental demonstration of the in situ backpropagation.

Behaviors learning

Fig. 2. Behaviors learning using MNNs.

Training mechanical neural networks for regression tasks

Fig. 3. Linear regression using MNNs.

Training mechanical neural networks for classification tasks

Fig. 4. Classification using MNNs.

Retrainability

Fig. 5. Retrainable MNNs.

Discussion

Methods

Sample fabrication

Experimental setup and measurements

Numerical simulations

Reporting summary

Supplementary information

Source data

Acknowledgements

Author contributions

Peer review

Peer review information

Data availability

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases