Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2024 Jul 15;121(30):e2411913121. doi: 10.1073/pnas.2411913121

Training neural networks using physical equations of motion

T Patrick Xiao a,1
PMCID: PMC11287243  PMID: 39008681

Minimization principles are at the core of physics. The most well-known is the principle of least action, which can be used to derive Newton’s laws of motion and Maxwell’s equations for electricity and magnetism. Another is Onsager’s principle of least power dissipation, which describes how physical systems evolve toward steady-state configurations that minimize the rate of power dissipation (1). These principles act in parallel on all parts of a physical system, leading to rapid optimization of macroscopic (system-level) quantities that often depend in extremely complicated ways on the system’s microscopic (component-level) properties. Minimization, or optimization, is also at the core of computing and is used everyday to solve an endless variety of real-world problems. With the slowing performance gains from conventional digital processing in recent years, it is an attractive prospect to map these optimization problems directly onto physical systems and then allow these systems to relax via their physical equations of motion to a state that represents the optimal answer—see Fig. 1A. Many systems have been built or proposed that use this strategy to efficiently find answers to combinatorial optimization problems, which are difficult to solve yet universal in their applications (2). Machine learning, which has also become ubiquitous over the last decade, relies on optimization as well. In this issue of PNAS, Dillavou et al. (3) demonstrate an analog electronic system that relies exclusively on physical dynamics to optimally fit a nonlinear curve to a set of data points; a miniaturized version of the much more complex data patterns that a large-scale artificial intelligence system can learn.

Fig. 1.

Fig. 1.

(A) Minimization principles in physics can be leveraged for machine learning. Trainable parameters in a neural network are represented by dynamically evolving parameters of the physical system. (B) The twin network of nonlinear resistors in ref. 3 for supervised learning. The resistors are implemented by the twin edge circuit in the Inset, which allows dynamic self-adjustment of the conductance. Panel (B) is reproduced from ref. 3.

Neural networks can learn arbitrarily complex input–output relationships, when provided with many trainable parameters and a source of nonlinearity. In supervised learning, a set of input/output data points are provided as the ground truth, and training is an optimization problem: Parameters are computed that minimize some error metric between the predicted outputs and the ground truth. For deep neural networks, the most widely used method to optimize the parameters is the backpropagation algorithm (4), and virtually all training is done today on digital hardware based on the conventional von Neumann architecture, such as Graphics Processing Units. The rapid growth in size of neural networks in recent years—faster than the rate of improvement of digital processors—has spurred many research groups to build mixed-signal architectures that use physical computation to make parts of backpropagation and other training algorithms more efficient. For example, systems that train weights stored in resistive memory crossbars (57) have been used to parallelize and reduce the power consumption of mathematical primitives such as matrix–vector multiplication and vector outer products. Nonetheless, these mixed-signal systems rely on digital processors, memory, and clocking to orchestrate and perform critical computations in the training process, which ultimately limits their energy efficiency.

By contrast, the electronic system developed by Dillavou et al. exclusively makes use of physical dynamics to perform the optimization needed for supervised learning, without any reliance on digital instructions. It is therefore a hardware prototype of a physical neural network (8). The neural network is a fully recurrent, single-layer model with input, output, and hidden nodes, and the physical system is a twin network (a free network paired with a clamped network) of variable nonlinear resistors implemented using transistors in a standard Complementary Metal Oxide Semiconductor process, shown in Fig. 1B. The pair of free and clamped transistors at each node is bridged at their gate terminals by a capacitor, and the charge on this capacitor modulates the nonlinear conductance of the transistors. This quantity of charge evolves during training with the help of local analog feedback circuits that dynamically charge or discharge each capacitor. A digital controller was used only to sequentially present the inputs and the ground truth outputs to the system as external stimuli. These stimuli act as boundary conditions that set in motion the learning that occurs autonomously in the self-adjusting physical system. All of the capacitors in the network evolve in parallel and eventually settle to values that represent the learned parameters that optimally fit to the presented data.

In this issue of PNAS, Dillavou et al. demonstrate an analog electronic system that relies exclusively on physical dynamics to optimally fit a nonlinear curve to a set of data points; a miniaturized version of the much more complex data patterns that a large-scale artificial intelligence system can learn.

An analogy can be made between this system and recent efforts to develop Ising machines using networks of resistively coupled nonlinear oscillators (911). In those systems, the network of oscillators collectively minimizes the objective function (or energy) of the Ising problem, even while each oscillator evolves according to its local dynamical equations. Therefore, optimization can be considered an emergent property of the distributed and parallel local dynamics (12), which is fundamentally driven by Onsager’s principle which acts within each oscillator to minimize the power dissipation in its local resistive connections (2). Likewise, in the twin network of nonlinear resistors in the present work, supervised learning is an emergent property of the physical dynamics local to each transistor pair. These local dynamics do not implement backpropagation, where training signals propagate to all the parameters from a single source but rather a more decentralized form of learning based on local learning rules. The underlying physical mechanism responsible for realizing the local learning rule for each parameter is the principle of least power dissipation manifested in the electrical circuits. The authors show that the shared gate voltage in each transistor pair evolves to minimize the difference in the power dissipation between the two transistors and that this physical evolution is mathematically equivalent to the desired local learning rule.

Physics-inspired local learning rules are a concept that lies at the intersection of machine learning, neuroscience, and physics. Local learning rules update parameter values using only information that is available in the physical vicinity of the parameter, which aligns well (in terms of data movement overhead and error tolerance) with training accelerators that do not follow the conventional von Neumann architecture. The electronic network by Dillavou et al. implements a contrastive local learning rule called Coupled Learning (13), which is closely related to Equilibrium Propagation training (14); both are designed to train energy-based neural networks and take inspiration from physical dynamics. Algorithms that are trained by physics-inspired local learning rules have been studied through their “emulation” on digital processors. The present work takes the next step and forms the connection back to physics, by demonstrating supervised learning using the actual physical dynamics of nonlinear elements that interact in continuous time. The authors experimentally show that their small-scale system can accurately learn the XOR gate—the most elementary nonlinear function—and one-dimensional nonlinear regression tasks.

The electronic construction of Dillavou et al. is an important proof of concept that physical systems can be designed whose local dynamics implement the optimization needed for machine learning. Although the hardware prototype does not yet deliver on the potentially massive performance and efficiency benefits of this style of computing on problems of practical size, these gaps may be addressed in future work as different implementations of physical neural networks are explored. The algorithmic scalability of known local learning rules have been studied, but the scalability of physical systems that implement larger, more complex neural network architectures remains an open question. Also important is how scaling affects the underlying algorithm’s robustness to analog errors induced by noise, process variations, and various parasitic effects (15). To eventually demonstrate a clear energy efficiency advantage over conventional hardware, future work will also need to reduce the overhead of supporting feedback circuitry and use dynamical components that operate at the minimum absolute levels of power dissipation. Solving these research challenges may allow future machine learning algorithms to be trained in a way that harnesses the speed and energy efficiency of the fundamental optimization processes in physics.

Acknowledgments

T.P.X. receives support from the Laboratory Directed Research and Development program at Sandia National Laboratories. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the US Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the US Department of Energy or the United States Government.

Author contributions

T.P.X. wrote the paper.

Competing interests

The author declares no competing interest.

Footnotes

See companion article, “Machine learning without a processor: Emergent learning in a nonlinear analog network,” 10.1073/pnas.2319718121.

References

  • 1.Onsager L., Reciprocal relations in irreversible processes II. Phys. Rev. 38, 2265–2279 (1931). [Google Scholar]
  • 2.Vadlamani S. K., Xiao T. P., Yablonovitch E., Physics successfully implements Lagrange multiplier optimization. Proc. Natl. Acad. Sci. U.S.A. 117, 26639–26650 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Dillavou S., et al. , Machine learning without a processor: Emergent learning in a nonlinear analog network. Proc. Natl. Acad. Sci. U.S.A. 121, e2319718121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rumelhart D. E., Hinton G. E., Williams R. J., Learning representations by back-propagating errors. Nature 323, 533–536 (1986). [Google Scholar]
  • 5.Xiao T. P., Bennett C. H., Feinberg B., Agarwal S., Marinella M. J., Analog architectures for neural network acceleration based on non-volatile memory. Appl. Phys. Rev. 7, 031301 (2020), 10.1063/1.5143815. [DOI] [Google Scholar]
  • 6.Yi S.-I., Kendall J. D., Williams R. S., Kumar S., Activity-difference training of deep neural networks using memristor crossbars. Nat. Electron. 6, 45–51 (2023). [Google Scholar]
  • 7.Nandakumar S. R., et al. , Mixed-precision deep learning based on computational memory. Front. Neurosci. 14, 406 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Marković D., Mizrahi A., Querlioz D., Grollier J., Physics for neuromorphic computing. Nat. Rev. Phys. 2, 499–510 (2020). [Google Scholar]
  • 9.Wang T., Roychowdhury J., “OIM: Oscillator-based ising machines for solving combinatorial optimisation problems” in Unconventional Computation and Natural Computation, McQuillan I., Seki S., Eds. (UNC Tokyo, Japan, 2019), pp. 232–256. [Google Scholar]
  • 10.Dutta S., et al. , An Ising Hamiltonian solver based on coupled stochastic phase-transition nano-oscillators. Nat. Electron. 4, 502–512 (2021). [Google Scholar]
  • 11.Lo H., Moy W., Yu H., Sapatnekar S., Kim C. H., An Ising solver chip based on coupled ring oscillators with a 48-node all-to-all connected array architecture. Nat. Electron. 6, 771–778 (2023). [Google Scholar]
  • 12.Vadlamani S. K., Xiao T. P., Yablonovitch E., Combinatorial optimization using the Lagrange primal-dual dynamics of parametric oscillator networks. Phys. Rev. Appl. 21, 044042 (2024). [Google Scholar]
  • 13.Stern M., Hexner D., Rocks J. W., Liu A. J., Supervised learning in physical networks: From machine learning to learning machines. Phys. Rev. X 11, 021045 (2021). [Google Scholar]
  • 14.Scellier B., Bengio Y., Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wright L. G., et al. , Deep physical neural networks trained with backpropagation. Nature 601, 549–555 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES