Skip to main content
Entropy logoLink to Entropy
. 2025 Aug 19;27(8):878. doi: 10.3390/e27080878

Geometric Neural Ordinary Differential Equations: From Manifolds to Lie Groups

Yannik P Wotte 1,*, Federico Califano 1, Stefano Stramigioli 1
Editors: Fanzhang Li1, Li Liu1
PMCID: PMC12385718  PMID: 40870350

Abstract

Neural ordinary differential equations (neural ODEs) are a well-established tool for optimizing the parameters of dynamical systems, with applications in image classification, optimal control, and physics learning. Although dynamical systems of interest often evolve on Lie groups and more general differentiable manifolds, theoretical results for neural ODEs are frequently phrased on Rn. We collect recent results for neural ODEs on manifolds and present a unifying derivation of various results that serves as a tutorial to extend existing methods to differentiable manifolds. We also extend the results to the recent class of neural ODEs on Lie groups, highlighting a non-trivial extension of manifold neural ODEs that exploits the Lie group structure.

Keywords: neural ordinary differential equations, differential geometry, Lie groups, machine learning, optimal control

1. Introduction

Ordinary differential equations (ODEs) are ubiquitous in the engineering sciences, from modeling and control of simple physical systems like pendulums and mass–spring–dampers, or more complicated robotic arms and drones, to the description of high-dimensional spatial discretizations of distributed systems, such as fluid flows, chemical reactions, or quantum oscillators. Neural ordinary differential equations (neural ODEs) [1,2] are ODEs parameterized by neural networks. Given a state x, and parameters θ representing the weights and biases of a neural network, a neural ODE reads as follows:

x˙=fθ(x,t),x(0)=x0. (1)

First introduced by [1] as the continuum limit of recurrent neural networks, the number of applications of neural ODEs quickly exploded beyond simple classification tasks: learning highly nonlinear dynamics of multi-physical systems from sparse data [3,4,5], optimal control of nonlinear systems [6], medical imaging [7], and real-time handling of irregular time series [8], to name but a few. Discontinuous state transitions and dynamics [9,10], time-dependent parameters [11], augmented neural ODEs [12], and physics-preserving formulations [13,14] present further extensions that increase the expressivity of neural ODEs.

However, these methods are typically phrased for states xRn. For many physical systems of interest, such as robot arms, humanoid robots, and drones, the state lives on differentiable manifolds and Lie groups [15,16]. More generally, the manifold hypothesis in machine learning raises the expectation that many high-dimensional data-sets evolve on intrinsically lower-dimensional, albeit more complicated, manifolds [17]. Neural ODEs on manifolds [18,19] presented significant steps to address this gap, with the first optimization methods for neural ODEs on manifolds. Yet, the general tools and approaches available on Rn, such as running costs, augmented states, time-dependent parameters, control inputs, or discontinuous state transitions, are rarely addressed in a manifold context. Similar issues persist in a Lie group context, where neural ODEs on Lie groups [20,21] have been formalized.

Our goal is to extend further architectures and costs for neural ODEs from Rn to arbitrary manifolds (cf. Table 1), and in particular Lie groups, and to equip the reader with the technical background for their own extensions. Here the main conceptual challenge lies in phrasing chart-independent optimization methods [18,19] in a manner that easily adapts to a variety of neural ODE architectures and cost functions [1,3,12]. To this end we present a systematic approach for deriving geometric versions of the adjoint sensitivity method [1,2], which is a memory-efficient and scalable tool for the optimization of neural ODEs (cf. Section 1.1). Such benefits extend to manifolds and Lie groups [18,19,20,21]. A second challenge, both conceptual and practical, lies in expressing various manifolds in terms of local charts and in expressing neural-net-parameterized functions, dynamics, and tensor fields in local charts. To this end we classify existing methods into extrinsic [19,20] and intrinsic [18,21,22] approaches, a distinction inspired by well-known differential geometric concepts. In our context the distinction suggests different parameterizations, affects numerical integration techniques, and affects scaling to high-dimensional dynamics. Specifically, our contributions are as follows:

  • Systematic derivation of adjoint methods for neural ODEs on manifolds and Lie groups, highlighting the differences and equivalence of various approaches—for an overview, see also Table 1;

  • Summarizing the state of the art of manifold and Lie group neural ODEs by formalizing the notion of extrinsic and intrinsic neural ODEs;

  • A tutorial on neural ODEs on manifolds and Lie groups, with a focus on the derivation of coordinate-agnostic adjoint methods for optimization of various neural ODE architectures. Readers will gain a conceptual understanding of the geometric nature of the underlying variables, a coordinate-free derivation of adjoint methods and learn to incorporate additional geometric and physical structures. On the practical side, this will aid in the derivation and implementation of adjoint methods with non-trivial terms for various architectures, also with regard to coordinate expressions and chart transformations.

Table 1.

Summary of neural ODEs on manifolds and Lie groups presented in this article.

Name of Neural ODE Subtype Trajectory Cost Subsection Originally Introduced in
Neural ODEs on manifolds (Section 3) Extrinsic Running and final cost Section 3.1.1 Final cost [19], running cost [21]
Intrinsic Running and final cost, intermittent cost Section 3.1.2 and Section 3.2.1 Final cost [18], running cost [21], intermittent cost (this work)
Augmented, time-dependent parameters Final cost Section 3.2.2 Augmenting M to TM [23], Augmenting M to M×N (this work)
Neural ODEs on Lie groups (Section 4) Extrinsic Final cost and intermittent cost Section 4.1 In [20]
Intrinsic, dynamics in local charts Running and final cost Section 4.2 In [21,24]
Intrinsic, dynamics on Lie algebra Running and final cost Section 4.2 In [21]

The remainder of this article is organized as follows. A brief state of the art on neural ODEs concludes this introduction. Section 2 provides a background on differentiable manifolds, Lie groups, and the coordinate-free adjoint method. Section 3 describes neural ODEs on manifolds and derives parameter updates via the adjoint method for various common architectures and cost functions, including time-dependent parameters, augmented neural ODEs, running costs, and intermediate cost terms. Section 4 describes neural ODEs on matrix Lie groups, explaining the merits of treating Lie groups separately from general differentiable manifolds. Both Section 3 and Section 4 also classify methods into extrinsic and intrinsic approaches. We conclude with a discussion in Section 5, highlighting advantages, disadvantages, challenges, and promise of the presented material. Appendix A includes a background on Hamiltonian systems, which appear when transforming the adjoint method into a form that is unique to Lie groups.

1.1. Literature Review

For a general introduction to neural ODEs, see [25]. Neural ODEs on Rn with fixed parameters were first introduced by [1], and parameter optimization via the adjoint method allowed for intermittent and final cost terms on each trajectory. The generalized adjoint method [2] also allows for running cost terms. Memory-efficient checkpointing is introduced in [26] to address stability issues of adjoint methods. Augmented neural ODEs [12] introduced augmented state spaces to allow neural ODEs to express arbitrary diffeomorphisms. Time-varying parameters were introduced by [11], with similar benefits to augmented neural ODEs. Neural ODEs with discrete transitions were formulated in [9,10], with [9] also learning event-triggered transitions common in engineering applications. Neural controlled differential equations (CDEs) were introduced in [27] for handling irregular time series, and parameter updates reapply the adjoint method [1]. Neural stochastic differential equations (SDEs) were introduced in [28], relying on a stochastic variant of the adjoint method for the parameter update. The previously mentioned literature phrases dynamics of neural ODEs on Rn.

Recent trends in research on neural ODEs focus on structure preservation to improve performance and reduce training time by appropriately restricting the class of parameterized vector fields. This includes symmetry preservation by equivariant [23] and approximately equivariant neural ODEs [29], which tackle symmetric and approximately symmetric time series and dynamics, e.g., in N-body dynamics and molecular dynamics. It also includes physics preservation in a physics learning context, where Hamiltonian neural networks [30,31] and (generalized) Lagrangian neural networks [32,33,34] improve performance by guaranteeing energy conservation. In control and model order reduction, port-Hamiltonian neural ODEs [3,35] further allow for learning models that interact with external ports in a power-preserving manner. These methods also phrase dynamics on Rn and frequently apply the adjoint method for parameter updates.

Neural ODEs on manifolds were first introduced by [19], including an adjoint method on manifolds for final cost terms and application to continuous normalizing flows on Riemannian manifolds, but embedding manifolds into Rn. Neural ODEs on Riemannian manifolds are expressed in local exponential charts in [18], avoiding embedding into Rn and considering final cost terms in the optimization. Charts for unknown, non-trivial latent manifolds together with dynamics in local charts are learned from high-dimensional data in [22], also including discretized solutions to partial differential equations. Parameterized equivariant neural ODEs on manifolds are constructed in [23], also commenting on state augmentation to express arbitrary (equivariant) flows on manifolds.

Neural ODEs on Lie groups were first introduced in [36] on the Lie group SE(3) to learn the port-Hamiltonian dynamics of a drone from an experiment, expressing group elements on an embedding R12, and the approach was formalized to port-Hamiltonian systems on arbitrary matrix Lie groups in [20], embedding m×m matrices in Rm2.

Neural ODEs on SE(3) were phrased in local exponential charts in [24] to optimize a controller for a rigid body using a chart-based adjoint method in local exponential charts. As an alternative, a Lie algebra-based adjoint method on general Lie groups was introduced in [21], foregoing Lie group-specific numerical issues of applying the adjoint method in local charts.

The choice of numerical solver in integrating neural ODEs and adjoint sensitivity equations is a nuanced area with much active research, especially for highly stiff [37], highly nonlinear [38,39], and structure-preserving neural ODEs [40]. We point towards the aforementioned sources for the interested reader. Results are expected to carry over into a manifold and Lie group context, where they hold in local charts. Also Lie group integrators [41,42] may be of interest for geometrically exact integration but are not well-investigated in a neural ODE context [20,21].

The optimization of neural ODEs via adjoint sensitivity methods is also referred to as “optimize-then-discretize” [25,43], since the formulation of the continuous adjoint system (called “optimize”) precedes their numerical solution (called “discretize”). This is opposed to “discretize-then-optimize” approaches, in which the neural ODE is first solved numerically (discretize) and gradients are then backpropagated through the numerical solver (optimize) [25,37,43]. Comparing the two, the constant memory efficiency of “optimize-then-discretize” approaches allows them to scale better to high-dimensional systems, giving them an edge for cases with more than 100 parameters and states [43]. Instead, “discretize-then-optimize” boasts higher accuracy and speed for low-dimensional systems, as well as highly stiff systems in which adjoint methods struggle with stability [37]. A popular discrete alternative to neural ODEs for physics-informed dynamics learning is given by variational integrator networks (VINs) [44,45], phrasing Lagrangian and Hamiltonian dynamics as discrete systems that conserve energy and the symplectic structure of the continuum dynamics [46,47]. Recent work [48] on Lie group forced VINs (LieFVINs) also allows inputs to the Lagrangian and Hamiltonian dynamics to be included in the variational formulation, allowing discrete optimal control. Both VINs and LieFVINs are applicable in a Lie group context, where they conserve geometry, symplecticity, and energy. The approach does not use adjoint methods for optimization and outperforms neural ODEs in the investigated conservative, low-dimensional dynamical systems [44,45,48]. Compared to continuous neural ODEs, both VINs and LieFVINs are discrete, which removes overhead from ODE solvers for lightweight applications, but their necessarily energy-based formulation presently restricts their use cases to conservative physical systems. We mention this promising area for completeness but narrow our attention to a geometric “optimize-then-discretize” approach via adjoint methods in the remainder of this article.

1.2. Notation

For a complete introduction to differential geometry see, e.g, [49], and for Lie group theory see [50].

Calligraphic letters M,N, denote smooth manifolds. For conceptual clarity, the reader may think of these manifolds as embedded in a high-dimensional RN, e.g., MRN. The set C(M,N) contains smooth functions between M and N, and we define C(M):=C(M,R).

The tangent space at xM is TxM and the cotangent space is Tx*M. The tangent bundle of M is TM, and the cotangent bundle of M is T*M. Then X(M) denotes the set of vector fields over M, and Ωk(M) denotes the set of k forms, where Ω1(M) are co-vector fields and Ω0(M)=C(M) are smooth functions V:MR. The exterior derivative is denoted as d:Ωk(M)Ωk+1(M). For functions VC(M×N,R), with xM, yN, we denote by dxV(y)Tx*M the partial differential at xM. Curves x:RM are denoted as x(t), and their tangent vectors are denoted as x˙Tx(t)M.

A Lie group is denoted by G and its elements by g,h. The group identity is eG, and I denotes the identity matrix. The Lie algebra of G is g, and its dual is g*. Letters A˜,B˜ denote vectors in the Lie algebra, while letters A,B denote vectors in Rn.

In coordinate expressions, lower indices are covariant and upper indices are contravariant components of tensors. For example for a (0,2)-tensor M the components Mij are covariant, and for non-degenerate M the components of its inverse M1 are Mij, which are contravariant. We use the Einstein summation convention aibi:=iaibi; i.e., the product of variables with repeated lower and upper indices implies a sum.

Denoting W as a topological space, D the Borel σ-algebra, and P:D[0,1] a probability measure, the tuple (W,D,P) denotes a probability space. Given a vector space L and a random variable C:XL, the expectation of C with respect to P is EwP(C):=WC(w)dP(w).

2. Background

2.1. Smooth Manifolds

Given an n-dimensional manifold M, with UM being an open set and Q:URn a homeomorphism, we call (U,Q) a chart and we denote the coordinates of xU as

(q1,,qn):=Q(x),xUM. (2)

Smooth manifolds admit charts (U1,Q1) and (U2,Q2) with smooth transition maps Q21=Q2Q11 defined on the intersection U1U2, and a collection A of charts (U,Q) with smooth transition maps is called a smooth atlas. For examples of local charts for particular manifolds, see [49], Example 1.4, Example 1.5. A vector field fX(M) assigns a vector f(x)TxM at any point xM. This defines a dynamic system, also shown in a local chart (U,Q) with components fi(q),q˙iR:

x˙=f(x)=fi(q)Qi;x(0)=x0, (3)
q˙i=fi(q);q(0)=Q(x0). (4)

Solutions of (3) are then found by numerical integration of (4), applying chart transitions (e.g., q2(t)=Q21q1(t) from q1(t)=Q1x(t) to q2(t)=Q2x(t)) during integration to avoid coordinate singularities (cf. Section 3.1.2). Denote the solution of (3) by the flow operator

Ψft:MM;Ψft(x0):=x(t). (5)

For a real-valued function VC(M), its differential is the covector field

dVΩ1(M);dV=VqidQi. (6)

Additionally, given a smooth manifold N and a smooth map φ:NM, with (U,Q) and (U¯,Q¯) appropriate charts of M and N, respectively, the pullback of dV via φ is

φ*dVΩ1(N);φ*dV:=d(Vφ)=φjq¯iVqjdQ¯i. (7)

With a Riemannian metric M (i.e., a symmetric, non-degenerate (0,2) tensor field) on M, the gradient of V is a uniquely defined vector field VX(M) given by

V:=M1dV=MijVqjqi. (8)

When M=Rn, we assume that M is the Euclidean metric and pick coordinates such that the components of the gradient and differential are the same. Finally, we define the Lie derivative of 1-forms, which differentiates ωΩ1(M) along a vector field fX(M) and returns LfωΩ1(M):

Lfω:=ddtΨft*ωt=0=ωj(qifj)dQi+(qjωi)fjdQi. (9)

2.2. Lie Groups

Lie groups are smooth manifolds with a compatible group structure. We consider real matrix Lie groups GGL(m,R), i.e., subgroups of the general linear group

GL(m,R):={gRm×m|det(g)0}. (10)

For g,hG the left and right translations by h are, respectively, the matrix multiplications

Lh(g):=hg, (11)
Rh(g):=gh. (12)

The Lie algebra of G is the vector space ggl(m,R), with gl(m,R)=Rm×m being the Lie algebra of GL(m,R).

Define a basis E:={E˜1,,E˜n} with E˜igRm×m, and define the (invertible linear) map Λ:Rng as (equivalently (e.g, [51]), Λ and Λ1 are often denoted as the operators “hat” :RnRm×m and “vee” :Rm×mRn, respectively)

Λ:Rng;(A1,,An)iAiE˜i. (13)

The dual of g is denoted g*, and given the map Λ we call Λ*:g*Rn its dual. For A˜,B˜g the small adjoint adA˜(B˜) is a bilinear map, and the large adjoint Adg(A˜) is a linear map

ad:g×gg;adA˜(B˜)=A˜B˜B˜A˜, (14)
Ad:G×gg;Adg(A˜)=gA˜g1. (15)

In the remainder of this article, we exclusively use the adjoint representation adA:RnRn, written without a tilde in the subscript A, and adjoint representation Adg:Rn×Rn, which are obtained as

adA:=Λ1adΛ(A)Λ(·), (16)
Adg:=Λ1AdgΛ(·). (17)

The exponential map exp:gG is a local diffeomorphism given by the matrix exponential ([50], Chapter 3.7)

exp(A˜):=n=01n!A˜n. (18)

Its inverse log:Ulogg is given by the matrix logarithm, and it is well-defined on a subset UlogG ([50], Chapter 2.3):

log(g)=n=1(1)n+1(gI)nn. (19)

Often, these infinite sums in (18) and (19) can be further reduced to a finite sums in m terms by use of the Cayley–Hamilton theorem [52]. A chart (Uh,Qh) on G that assigns zero coordinates to hG can be defined using (19) and (13):

Uh={hg|gUlog}, (20)
Qh:UhRn;gΛ1log(h1g), (21)
Qh1:RnG;qhexpΛ(q). (22)

The chart (Uh,Qh) is called an exponential chart, and a collection A of exponential charts (Uh,Qh) that cover the manifold is called an exponential atlas.

The differential of a function VC(G,R) is the co-vector field dVΩ1(G) (see also Equation (6)). For any given gG we further transform the co-vector dV(g)Tg*G to a left-trivialized differential, which collects the components of the gradient expressed in g*:

dgLV:=Λ*Lg*dV(g)=qVgI+Λ(q)|q=0Rn. (23)

For a derivation of this coordinate expression, see ([21], Section 3).

2.3. Gradient over a Flow

We are interested in computing the gradient of functions with respect to the initial state of a flow. The adjoint sensitivity equations are a set of differential equations that achieve this. In the following, we show a derivation of the adjoint sensitivity on manifolds ([21], App. A2). Given a function C:MR, a vector field fX(M), the associated flow Ψft:MM, and a final time TR, the goal of the adjoint sensitivity method on manifolds is to compute the gradient

dCΨfT(x0).

In the adjoint method we define a co-state λ(t)=d(CΨTt)x(t)Tx(t)*M, which represents the differential of Cx(T) with respect to x(t). The adjoint sensitivity method describes its dynamics, which are integrated backwards in time from the known final condition λ(T)=dCx(T), see also Figure 1. The adjoint sensitivity method is stated in Theorem 1.

Figure 1.

Figure 1

(a) The problem of computing the gradient over a flow, highlighting the cotangent spaces dCx(T)Tx(T)*M and dCΨfT(x0)=(ΨfT)*dCx(T)Tx0*M. (b) In the adjoint method we set λ(t)=d(CΨTt)x(t), whose dynamics are uniquely determined by the property Lfλ=0, allowing us to find λ(0)=dCΨfT(x0) by integrating λ˙ backwards from λ(T)=dCx(T).

Theorem 1

(Adjoint sensitivity on manifolds). The gradient of a function CΨfT is

dCΨfT(x0)=λ(0), (24)

where λ(t)Tx(t)*M is the co-state. In a local chart (U,Q) of M with coordinates q(t)=Q(x(t)), λ(t)=λi(t)dQi, the state and co-state satisfy the dynamics

q˙j=fj(q),q(0)=Q(x0), (25)
λ˙i=λjqifj(q),λi(T)=Cqix(T). (26)

Proof. 

Define the co-state λ(t)Tx(t)*M as

λ(t):=(ΨfTt)*dCx(T). (27)

Then Equation (24) is recovered by application of Equation (7):

λ(0)=(ΨfT)*dCx(T)=(dCΨfT)(x0), (28)

A derivation of the dynamics governing λ(t) constitutes the remainder of this proof. By definition of λ(t) and the Lie derivative (9), we have that Lfλ(t)=0:

Lfλ(t)=dds(Ψfs)*λ(t+s)s=0=ddsλ(t)=0. (29)

If we further treat λ as a 1-form λΩ1(M) (denoted as λ by an abuse of notation), we obtain

Lfλ=λj(qifj)dQi+(qjλi)fjdQi=0.

The components satisfy the partial differential equation

λjqifj+fjqjλi=0. (30)

Impose that λ(t)=λΨft(x0) (this defines the 1-form λ along x(t)); then

λ˙i=λiqjq˙j=λiqjfj. (31)

Combining Equations (30) and (31) leads to Equation (26):

λ˙i=λjqifj. (32)

Expanding the final condition λ(T)=dCx(T) in local coordinates (see Equation (6)) gives

λ(T)=Cqix(T)dQi=λi(T)dQiλi(T)=Cqix(T). (33)

Given a chart transition from a chart (U1,Q1) to a chart (U2,Q2), e.g., during numerical integration of (26), the respective co-state components λ1,i and λ2,i are related by a transformation Aij=iQ1jQ21 as follows:

λi,2=Aijλj,1. (34)

A fact that will become useful in Section 4 is that Equations (25) and (26) have a Hamiltonian form. Define the control Hamiltonian Hc:T*MR as

Hc(x,λ)=λf(x,t)=λifi(q,t). (35)

Then Equation (25) and Equation (26), respectively, of Theorem 1 follow as the Hamiltonian equations on T*M:

q˙j=Hcλj=fj(q,t), (36)
λ˙i=Hcqi=λjqifj(q,t). (37)

For a background on Hamilton’s equations, see also Appendix A.

3. Neural ODEs on Manifolds

A neural ODE on a manifold is an NN-parameterized vector field in X(M)—or including time dependence, it is an NN-parameterized vector field in X(M×R), with t in the R slot and t˙=1. Given parameters θRnθ, we denote this parameterized vector field as fθ(x,t):=f(x,t,θ). This results in the dynamic system

x˙=fθ(x,t),x(0)=x0. (38)

The key idea of neural ODEs is to tackle various flow approximation tasks by optimizing the parameters with respect to a to-be-specified optimization problem. Denote a finite time horizon T and intermittent times T1,T2,<T. Denote a general trajectory cost by

CfθT(x0,θ)=F(θ,ΨfθT0(x0),ΨfθT1(x0),,ΨfθT(x0))+0Tr(Ψfθs(x0),s)ds, (39)

with an intermittent and final cost term F and running cost r. Indicating a probability space (M,D,P), we define the total cost as

J(θ):=Ex0PCfθT(x0,θ). (40)

The minimization problem takes the form

minθJ(θ). (41)

Note that (41) is not subject to any dynamic constraint—the flow already appears explicitly in the cost CfθT.

Normally, the optimization problem is solved by means of a stochastic gradient descent optimization algorithm [53]. In this, a batch of N initial conditions xi is sampled from the probability distribution corresponding to the probability measure P. Writing Ci=CfθT(xi,θ), the parameter gradient θJ(θ) is approximated as

θJ(θ)=Ex0PθCfθT(x0)1Ni=0NθCi. (42)

In this section, we show how to optimize the parameters θ for various choices of neural ODEs and cost functions, with (39) being the most general case of a cost, and highlight similarities in the various derivations. In the following, the gradient θCi is computed via the adjoint method on manifolds for various scenarios. The advantage of the adjoint method over, e.g., automatic differentiation of Ci/backpropagation through an ODE solver is that it has a constant memory efficiency with respect to the network depth T.

3.1. Constant Parameters and Running and Final Cost

Here we consider neural ODEs of the form (38), with constant parameters θ and cost functions of the form

CfθT(x0,θ)=F(ΨfθT(x0),θ)+0Tr(Ψfθs(x0),θ,s)ds, (43)

with a final cost term F and a running cost term r. This generalizes [2] to manifolds. Compared to existing manifold methods for neural ODEs [18,54], the running cost is new.

The parameter gradient’s components θCfθT(x0,t0),θRnθ are then computed by Theorem 2 (see also [21]):

Theorem 2

(Generalized Adjoint Method on Manifolds). Given the dynamics (38) and the cost (43), the parameter gradient’s components θCfθT(x0,t0),θRnθ are computed by

θCfθT(x0,t0),θ=Fθ(x(T),θ)+0Tθλjfθj(q(s))+r(q(s),θ,s)ds. (44)

where the state x(s)M and co-state λ(s)Tx(s)*M satisfy, in a local chart (U,Q) with q(t)=Q(x(t)), λ(t)=λi(t)dQi,

q˙j=fθj(q,t),q(0)=Q(x0),t(0)=t0, (45)
λ˙i=λjqifθj(q,t)rqi,λi(T)=Fqix(T),θ. (46)

Proof. 

Define the augmented state space as M=M×Rnθ×R×R to include the original state xM, parameters θRnθ, accumulated running cost LR, and time tR in the augmented state x:=(x,θ,L,t)M. In addition, define the augmented dynamics faugX(M) as

x˙=faug(x)=fθ(x,t)0r(x,θ,t)1,x(0)=x0:=x0θ0t0. (47)

This is an autonomous system with final state x(T)=(x(T),θ,0Tr(x,θ,s)ds,T). Next, define the cost Caug:MR on the augmented space:

Caug(x)=F(x,θ)+L. (48)

Then Equation (43) can be rewritten as the evaluation of a terminal cost Caugx(T):

CfθT(x0)=(CaugΨfaugT)(x0). (49)

By Theorem 1, the gradient dCaugΨfaugT is given by

d(CaugΨfaugT)(x0)=λ(0), (50)

and by Equation (26), the components of λ(s) satisfy

λ˙i=λjqifaugj,λi(T)=qiCaugx(T) (51)

Split the co-state into λq,λθ,λL,λt; then their components’ dynamics are as follows:

λ˙q,i=qiλq,jfθj(q,t)+λLr(q,θ,t),λq(T)=Fq(x(T),θ), (52)
λ˙θ,i=θiλq,jfθj(q,t)+λLr(q,θ,t),λθ(T)=Fθ(x(T),θ), (53)
λ˙L=0,λL(T)=LCaug(x(T),θ)=1, (54)
λ˙t=tλq,jfθj(q,t)+λLr(q,θ,t),λt(T)=tCaug(x(T),θ)=0. (55)

The component λL=1 is constant, so Equation (52) coincides with (46). Integrating (53) from s=0 to s=T recovers Equation (44). λt does not appear in any of the other equations, so Equation (55) may be ignored. □

In summary, the above proof depends on identifying a suitable augmented manifold M, with the goal that augmented dynamics faugX(M) are autonomous such that the cost function Caug:MR on the augmented manifold rephrases the cost (43) as a final cost Caug(x(T)). This allows Theorem 1 to be applied to derive the corresponding adjoint method. In later sections (Section 3.2), this process will be the main technical tool for generalizations of Theorem 2. The next sections describe common special cases of (38) and Theorem 2.

3.1.1. Vanilla Neural ODEs and Extrinsic Neural ODEs on Manifolds

The case of neural ODEs on Rn (e.g., [2]) is obtained by setting M=Rn. Scalar functions, vector fields, and tensor fields are readily expressed, see Table 2.

Table 2.

Parameterization of functions in extrinsic neural ODEs.

Function Vanilla Neural ODE Extrinsic Neural ODE
Scalar fields Vθ(x)R Vθ:RnR Vθ:RNR
Vector fields fθ(x,t)TxM fθ:Rn×RRn fθ:RN×RRN with tangency constraint [19], optional stabilization [55]
Components of (p,q)-tensor fields Mj1,,jqi1,,ip(x)R Mj1,,jqi1,,ip:RnR Mj1,,jqi1,,ip:RNR, see footnote 1

1 A tangency constraint on contravariant components of (p,q)-tensors is not necessarily required for the vector field fθ to remain tangent to ι(M) and depends on the vector field under investigation.

There is an overlap with extrinsic neural ODEs on manifolds (described, for instance, in [19]), which optimize the neural ODE on an embedding space RN, see also Figure 2.

Figure 2.

Figure 2

In the extrinsic formulation of neural ODEs on manifolds, the manifold M is embedded in RN as ι(M)RN, and a neural ODE fθX(RN) is optimized.

We denote the embedding as ι:MRN, where xM and yRN. Optimizing the neural ODE on RN requires extending the dynamics fθ(·,t)X(M) to a vector field fθ(·,t)X(RN) such that

ι*fθ(x,t)=fθ(ι(x),t). (56)

The dynamics fθ(y,t) are then used in Theorem 2, and also the co-state lives in T*RN.

As shown in [19], the resulting parameter gradients are equivalent to those resulting from an application in local charts, as long as it can be guaranteed that the integral curves of f(y,t) remain within ι(M)RN, i.e., are geometrically exact. Geometrically exact integration has to be guaranteed separately, either by integration in local charts [18] or stabilization techniques [55].

A strong upside of an extrinsic formulation is that existing neural ODE packages (e.g., [56]) can be applied directly. A downside to extrinsic neural ODEs is that finding f(y,t) may not be immediate, since tangency to ι(M) is required, see also Table 2. Finally, the extrinsic dimension N can be much larger than the intrinsic dimension n=dimM, leading to computational overhead that does not fully exploit the manifold hypothesis. Extrinsic methods for neural ODEs are the preferred choice when the intrinsic dimension n is small and there is a known embedding ι(M)RN with low extrinsic dimension N. Then the computational overhead due to N>n is negligible, and stabilization techniques [55] can be applied to guarantee geometrically exact integration.

3.1.2. Intrinsic Neural ODEs on Manifolds

The intrinsic case of neural ODEs on manifolds [18] is described by integrating the dynamics in local charts, see also Figure 3.

Figure 3.

Figure 3

In the intrinsic formulation of neural ODEs on manifolds, the neural ODE fθX(M) is optimized in local charts, here (U1,Q1) and (U2,Q2), and the state and co-state undergo chart transitions.

The advantage of intrinsic over extrinsic neural ODEs on manifolds is that the dimension of the resulting equations is as low as possible in the intrinsic case for a given manifold. The flexibility of chart representations gives intrinsic neural ODEs on manifolds the power to represent high-dimensional data distributions at their latent dimension, see especially [22] for learning charts from data and [18,21] for chart-switching methods during numerical integration. Numerical integration in local charts is also geometrically exact by default. However parameterization of scalar functions, vector fields, and tensor fields with neural networks in local charts, as well as their differentiation with respect to parameters, presents a source of complexity. There are three common methods to parameterize scalar-valued functions VC(M) in local charts (vector fields and tensor-valued functions directly follow by parameterizing their scalar component functions in an analogous way):

  • A partition of unity σi:RnR with respect to a collection of charts (Ui,Qi) can be used to sum over chart -components Vi:RnR as V(x):=iσi(Qi(x))Vi(Qi(x)), see examples in [49], Chapter 2, and [24].

  • The function V can be directly defined by chart representatives Vi:=VQi1, enforcing compatibility between overlapping charts (Ui,Qi),(Uj,Qj) by soft constraints, which are implemented as additional cost terms Vi(qi)VjQjQi1(qi) that are minimized on chart overlaps UiUj, see [18,22].

  • Given an embedding ι:MRN and V¯C(RN), an extrinsic representation Vi:=V¯ιQi1 is possible, see [19,21].

Advantages and disadvantages are summarized in Table 3.

Table 3.

Parameterization of scalar functions and tensor components in intrinsic neural ODEs.

Partition of Unity [24,49] Soft Constraint [18,22] Pullback [19,21]
Components from all local charts are summed, weighted by a partition of unity. Function is directly represented in local charts. Function is pulled back to local chart.
Allows representation of arbitrary smooth functions. Functions are smooth where charts do not overlap, but are not well-defined at chart transitions. Allows representation of arbitrary smooth functions.
Differentiating functions generally requires differentiating through chart transition maps, creating computational overload [24]. Chart transition maps do not have to be differentiated. Chart representations of the embedding ι(M) are differentiated, possibly creating computational overload.

In available state-of-the-art packages for neural ODEs, the chart dynamics are phrased as discontinuous dynamics with state transitions Q1Q21:RnRn, but implementation is not yet streamlined for local charts, chart transitions of the co-state (cf. Section 2.3), and custom adjoint sensitivity equations.

3.1.3. Structure Preservation

Structure-preserving architectures narrow down the class of learnable neural ODEs from arbitrary vector fields X(M) to subsets of X(M) with particular properties, improving training speed and performance (cf. Section 1.1). Examples of such subsets are (symmetry-preserving) equivariant dynamical systems [23] and (physics-preserving) Hamiltonian, Lagrangian, and port-Hamiltonian dynamical systems [3]. Given that a structure-preserving parameterization of the neural ODE is known in closed form, these are readily implemented in the above formalism.

For example, reusing results from Table 2 and Table 3, Hamiltonian and Lagrangian neural ODEs [30,32] are fully determined by scalar functions HθC(T*Q),LθC(TQ), respectively, and their gradients. Hamiltonian neural ODEs are advantageous for joint learning of the dynamics and energy of conservative physical systems, where the learned Hamiltonian vector fields XHθ(T*Q) are guaranteed to conserve the Hamiltonian Hθ representing the total energy. Lagrangian neural ODEs likewise enable learning the dynamics of conservative physical systems and enable incorporation of dissipative terms [34] but do not directly represent the total energy.

Port-Hamiltonian neural ODEs [3,35] offer further expressiveness: besides a scalar Hamiltonian HθC(M), they offer degrees of freedom in a skew-symmetric (2,0)-tensor Jθ (called a Poisson tensor), a positive-definite (2,0)-tensor Rθ (called a dissipation tensor), and a linear input map Bθ(x):RkTxM. This allows learning the dynamics of non-conservative dynamical systems that can be coupled with known physical systems and control inputs through the input map, while jointly learning the total energy Hθ, rate of energy dissipation Rθ(dHθ,dHθ), and externally supplied power (see [57] for an introductory overview). Most physical systems can be represented in a port-Hamiltonian form [58], giving this parametrization a high degree of expressiveness that has been used in dynamics learning [3], control [20,21], and model order reduction [35]. Albeit not investigated in practice, this expressiveness may also be a disadvantage compared to Lagrangian or Hamiltonian neural networks, resulting in overfitting when, e.g., small dissipation terms are learned where there is no dissipation. Generally speaking, choosing the most specific structure-preserving neural network is advised.

3.2. Extensions

The proof of Theorem 2 depended on identifying a suitable augmented manifold M, autonomous augmented dynamics faugX(M), and an augmented cost function Caug:MR that rephrases the cost (43) as a final cost Caug(x(T)) to apply Theorem 1. This approach generalizes to various other scenarios, including different cost terms, augmented neural ODEs on manifolds, and time-dependent parameters, presented in the following.

3.2.1. Nonlinear and Intermittent Cost Terms

We consider here the case of neural ODEs on manifolds of the form (38) with cost (39). This is a generalization of [1], in which intermittent cost terms appear for neural ODEs on Rn. For the final and intermittent cost term Fθ:M×M××MR, we denote by dkFθTx*M the differential with respect to the k-th slot and denote θ as a subscript to avoid confusion. The components of dkF will be denoted Fkqi. In this case, the parameter gradient is determined by repeated application of Theorem 2:

Theorem 3

(Generalized Adjoint Method on Manifolds). Given the dynamics (38) and the cost (39), the parameter gradient’s components θCfθT(x0,t0),θRnθ are computed by

θCfθT(x0,t0),θ=Fθ(θ,x(T1),x(T2),,x(T))+0Tθλjfθj(q(s))+r(q(s),θ,s)ds. (57)

where the state x(s)M satisfies (45) and the co-state λ(s)Tx(s)*M satisfies dynamics with discrete updates at times T1,,TN1 given by

λ˙q,i=qiλq,jfθj(q,t)+r(q,θ,t);λq,i(T)=FθNqi(x(T1),,x(T)) (58)
λi(Tk,)=λi(Tk,+)+Fθkqix(T1),,x(T), (59)

with Tk, being the instance after a discrete update at time Tk (recall that co-state dynamics are integrated backwards, so Tk,<Tk<Tk,+) and Tk,+ the instance before.

Proof. 

We introduce an augmented manifold M=M××M×Rnθ×R×R to include N copies of the original state xM, parameters θRnθ, accumulated running cost LR, and time tR in the augmented state x:=(x1,,xN,θ,L,t)M. Let

ϱTi(t)=1tTi0t>Ti, (60)

and define the augmented dynamics faugX(M) as

x˙=faug(x)=ϱT1(t)fθ(x1,t)ϱTN1(t)fθ(xN1,t)fθ(xN,t)0r(xN,θ,t)1,x(0)=x0:=x0x0x0θ0t0. (61)

This is an autonomous system with final state

x(T)=(x(T1),,x(TN1),x(T),θ,0Tr(x,θ,s)ds,T). (62)

Next, define the cost Caug:MR on the augmented space:

Caug(x)=Fθ(x1,,xN)+L. (63)

Then Equation (39) can be rewritten as the evaluation of a terminal cost Caugx(T):

CfθT(x0)=(CaugΨfaugT)(x0). (64)

Apply Equation (26), and split the co-state into λq1,,λqN,λθ,λL,λt; then their components’ dynamics are as follows:

λ˙q1,i=qiλq1,jϱT1(t)fθj(q1,t),λq1(T)=Fθ1q(x(T1),,x(T)), (65)
λ˙qN,i=NqiλqN,jfθj(qN,t)+λLr(qN,θ,t),λqN(T)=FθNq(x(T1),,x(T)), (66)
λ˙θ,i=θiλq1,jϱT1(t)fθj(q1,t)++λqN,jfθj(qN,t)+λLr(q,θ,t),λθ(T)=Fθθ(x(T1),,x(T)). (67)

We excluded the dynamics of λt, which does not appear in any of the other equations, and the constant λL=1. Finally, define the cumulative co-state

λq=ϱT1(t)λq1++λqN. (68)

Its dynamics at t[0,T]T1,·,TN1 are given by the sum of (65) to (66), letting q=qN:

λ˙q,i=λ˙q1,i++λ˙qN,i (69)
=qiλq,jfθj(q,t)+r(q,θ,t) (70)
λq(T)=FθNq(x(T1),,x(T)), (71)

with discrete jumps (58) accounting for the final conditions of λq1,,λqN, and the dynamics (67) can be rewritten as

λ˙θ,i=θiλq,jfθj(q,t)+r(q,θ,t);λθ(T)=Fθθ(x(T1),,x(T)). (72)

Integrating this from s=0 to s=T recovers Equation (57). □

Cost terms of this form are interesting for optimization of, e.g., periodic orbits [59] or trajectories on manifolds, where conditions at multiple checkpoints ΨfθTi(x0) may appear in the cost.

3.2.2. Augmented Neural ODEs on Manifolds and Time-Dependent Parameters

With state xM, augmented state αN (not to be confused with xM), and parameterized φθ:MN, augmented neural ODEs on manifolds are neural ODEs on the manifold M×N of the form

x˙α˙=fθ(x,α)gθ(x,α);x(0)α(0)=x0φθ(x0). (73)

Time t is not included explicitly in these dynamics, since it can be included in α. This case also includes the scenario of time-dependent parameters θ¯(t) as part of α. As the trajectory cost, we take a final cost

Cfθ,gθT(x0,θ)=F(Ψfθ,gθTx0,φθ(x0),θ). (74)

This is a generalization of [11,12].

Theorem 4

(Adjoint Method for Augmented Neural ODEs on Manifolds). Given the dynamics (73) and the cost (74), the parameter gradient’s components θCfθT(x0,t0),θRnθ are computed by

θCfθ,gθT(x0,φ(x0)),θ=Fθ(x(T),α(T),θ)+φjθλα,j(0)+0Tθλx,jfθj(q(s))+λα,jgθj(q(s))ds. (75)

where the states x(s)M,α(s)N satisfy (73) and co-states λx(s)Tx(s)*M,λα(s)Tα(s)*N, satisfy, in a local chart (U,Q) on M and U¯,Q¯ on N,

λ˙x,i=qiλx,jfθj(q,q¯,t)+λα,jgθj(q,q¯,t),λx,i(T)=Fqix(T),α(T),θ, (76)
λ˙α,i=q¯iλx,jfθj(q,q¯,t)+λα,jgθj(q,q¯,t),λα,i(T)=Fq¯ix(T),α(T),θ. (77)
Proof. 

Define the augmented state space as M=M×N×Rnθ to include the states xM,α(s)N and parameters θRnθ in the augmented state x:=(x,α,θ)M. In addition, define the augmented dynamics faugX(M) as

x˙=faug(x)=fθ(x,α)gθ(x,α)0,x(0)=x0:=x0φθ(x0)θ. (78)

This is an autonomous system with final state x(T)=(x(T),α(T),θ). Next, define the cost Caug:MR on the augmented space:

Caug(x)=F(x,α,θ). (79)

Then Equation (43) can be rewritten as the evaluation of a terminal cost Caugx(T). The gradient dCaugΨfaugT is given by an application of Equation (26). Split the co-state into λx,λα,λθ; then their components’ dynamics are as follows:

λ˙x,i=qiλx,jfθj(q,q¯,t)+λα,jgθj(q,q¯,t),λx(T)=Fq(x(T),α(T),θ), (80)
λ˙α,i=q¯iλx,jfθj(q,q¯,t)+λα,jgθj(q,q¯,t),λα,i(T)=Fq¯ix(T),α(T),θ, (81)
λ˙θ,i=θi(λx,jfθj(q,q¯,t)+λα,jgθj(q,q¯,t)),λθ(T)=Fθ(x(T),α(T),θ). (82)

Since α(0)=φθ(x0) also depends on θ, the total gradient of the cost with respect to θ is given by

θiCfθT(x0,φθ(x0)),θ=λθ,i(0)+φjθiλα,j(0). (83)

Integrate (82) to find λθ,i(0); then Equation (75) is recovered. □

Augmented neural ODEs are universal function approximators ([25], Chapter 2). Potential applications of augmented neural ODEs on manifolds include, e.g., the optimization of guiding vector fields for path-following of closed or self-intersecting paths [60], where state augmentation sits at the core of formulating singularity-free guiding vector fields for self-intersecting paths. In the same context, discontinuous initializations gα(x0) allow globally stabilizing controllers to be represented for topologically non-trivial manifolds (e.g., the sphere S2), where smooth controllers are necessarily not globally stable. A further degenerate application of Theorem 4 is obtained by removing x, i.e., fixing x=0 and fθ(x,α)=0 in Equation (73). Then both the dynamics gθ(α) and initial condition α(0)=φθ(0) are parameterized by θ, allowing joint optimization of the parameters and initial condition. This is interesting for joint optimization and numerical continuation, e.g., [59].

4. Neural ODEs on Lie Groups

Just as a neural ODE on a manifold is an NN-parameterized vector field in X(M) (or, including time, X(M×R)), a neural ODE on a Lie group can be seen as a parameterized vector field in X(G) (or X(G×R)). Similarly to Equation (38), this results in a dynamic system

g˙=fθ(g,t),g(0)=g0. (84)

Yet, Lie groups offer more structure than manifolds: the Lie algebra g provides a canonical space to represent tangent vectors, and its dual g* provides a canonical space to represent the co-state. Similarly, canonical (exponential) charts offer a structure for integrating dynamic systems [41]. Frequently, dynamics on a Lie group induce dynamics on a manifold M: by means of an action

Φ:G×MM;(g,x)Φ(g,x), (85)

evolutions g(t) induce evolutions x(t)=Φ(g(t),x0) on M. This makes neural ODEs on Lie groups interesting in their own right.

In this section, we describe optimizing (41) for the cost

CfθT(g0,θ)=F(ΨfθT(g0),θ)+0Tr(Ψfθs(g0),θ,s)ds, (86)

with a final cost term F and a running cost term r. We highlight the extrinsic approach and two intrinsic approaches, where one of the latter is particular to Lie groups.

4.1. Extrinsic Neural ODEs on Lie Groups

The extrinsic formulation of neural ODEs on Lie groups was first introduced by [20] and applies ideas of [54] (see also Section 3.1.1). Given GGL(m,R), this formulation treats the dynamic system (84) as a dynamic system on Rm2. Denote vec:Rm×mRm2 as an invertible map that stacks the components of an input matrix into a component vector (in canonical coordinates on Rm×m and Rm2, though this choice is not required.) and let projG:Rm×mG be a projection onto GRm×m. Further denote Ay=vec1(y) and gy=projG(Ay). A lift fθ(y,t) can then be defined as

fθ(y,t)=vecAygy1f(gy,θ,t). (87)

As was the case for extrinsic neural ODEs on manifolds, the cost gradient resulting from this optimization is well-defined and equivalent to any intrinsically defined procedure. However, the dimension m2 of the vectorization can be significantly larger than the intrinsic dimension of the Lie group.

4.2. Intrinsic Neural ODEs on Lie Groups

Theorem 2 directly applies to optimization of neural ODEs on Lie groups, given the local exponential charts (20) and (21) on G. This does not make full use of the available structure on Lie groups. Frequently, dynamical systems are of a left-invariant form (88) or a right-invariant form (89)

g˙=gΛρθL(g,t), (88)
g˙=ΛρθR(g,t)g. (89)

Denote K(q):TqRnRn as the derivative of the exponential map (see [21] for details). Then the chart representatives fθi in a local exponential chart (Uh,Qh) are

fθL,i(q,t)=(K1)ji(q)ρL,j(Qh1(q)), (90)
fθR,i(q,t)=(K1)ji(q)AdQh1(q)ρR,j(Qh1(q)). (91)

Application of Theorem 2 then requires computing qjfθL,i(q,t) or qjfθR,i(q,t). But this leads to significant computational overhead due to differentiation of the terms (K1)ji(q) (see [21]). Instead of applying Theorem 2, i.e., expressing dynamics in local charts, the dynamics can also be expressed in the Lie algebra g. Theorem 1 has a Hamiltonian form, which can be directly transformed into Hamiltonian equations on a Lie group (see also Appendix A). Applying this reasoning to Theorem 2, we arrive at the following form, which foregoes differentiating (K1)ji(x):

Theorem 5

(Left Generalized Adjoint Method on Matrix Lie Groups). Given the dynamics (88) and the cost (86), or the dynamics (89) with ρθL(g,t)=Adg1ρθR(g,t), the parameter gradient θCfθT(g0) of the cost is given by the integral equation

θCfθT(g0)=Fθ(g(T),θ)+0TθλgρθL(g,s)+r(g,θ,s)ds, (92)

where the state g(t)G and co-state λg(t)Rn are the solutions of the system of equations

g˙=fθ(g,t),g(0)=g0, (93)
λ˙g=dgLλgρθL(g,s)+r(g,θ,s)+adρθL(g,t)λg,λg(T)=dgLF(g(T),θ). (94)

Proof. 

This is proven in two steps. First, define the time- and parameter-dependent control Hamiltonian Hc:T*M×Rnθ×RR as

Hc(x,λ,θ,t)=λfθ(x,t)+r(x,θ,t)=λifθi(q,t)+r(q,θ,t). (95)

The equations for the state and co-state dynamics (45) and (46), respectively, of Theorem 2 follow as the Hamiltonian equations on T*M:

q˙j=Hcλj=fθj(q,t), (96)
λ˙i=Hcqi=λjqifθj(q,t)rqi. (97)

And the integral Equation (44) reads

θCfθT(x0,t0),θ=Fθ(x(T),θ)+0THcθdt. (98)

Second, rewrite the control Hamiltonian (95) on a Lie group G, i.e., Hc:T*G××Rnθ×RR. By substituting λg(t)=Λ*Lg*λ(t) (see also Equation (A6)), this induces Hc:G×g*×Rnθ×RR,

Hc(g,λg,θ,t)=λgρθL(g,t)+r(g,θ,t). (99)

Finally Hamilton’s equations (96) and (97) are rewritten in their form on a matrix Lie group by means of (A7) and (A8), which recovers Equations (93) and (94):

g˙=gΛ(Hcλg), (100)
λ˙g=dgLHc+adHcλgλg. (101)

To find the final condition for λg, use that λg(t)=Λ*Lg*λ(t):

λg(T)=Λ*Lg*λ(T)=Λ*Lg*dF(g(T),θ)=dgLF(g(T),θ). (102)

Similar equations also hold on abstract (non-matrix) Lie groups, see [21]. Compared to the extrinsic method of Section 4.1, Theorem 5 has the advantage that the dimension of the co-state λg is as low as possible. Compared to the chart-based approach on Lie groups, Theorem 5 foregoes differentiating through the terms Kji(q), avoiding overhead. Compared to a chart-based approach on manifolds, the choice of charts is also canonical on Lie groups. Although the Lie group approach foregoes many of the pitfalls of intrinsic neural ODEs on manifolds, implementation in existing neural ODE packages is currently cumbersome: the adjoint sensitivity equations (94) have a non-standard form, requiring an adapted dynamics of the co-state λ, but these equations are rarely intended for modification in existing packages. Packages for geometry-preserving integrators on Lie groups, such as [41], are also not readily available for arbitrary Lie groups.

4.3. Extensions

The proof of Theorem 5 relied on finding a control Hamiltonian formulation for Theorem 2. This approach generalizes to methods in Section 3.2, which rely on the use of Theorem 1. This is because Theorem 1 itself has a Hamiltonian form ([21,54]).

A further straightforward extension of the methods presented in this Section are port-Hamiltonian neural ODEs on Lie groups [20]. In [20], these are systems with a configuration on a Lie group G and momentum on g*. In terms of the theory presented above, such port-Hamiltonian dynamics can be phrased as a dynamic system on a product Lie group G×g* (taking vector addition as the group operation on g*), recovering both extrinsic [20] and intrinsic [21] port-Hamiltonian neural ODEs on Lie groups.

5. Discussion

We discuss advantages and disadvantages of the main flavors of the presented formulations for manifold neural ODEs, expanding on the previous sections. We focus on extrinsic (embedding dynamics in RN) and intrinsic (integrating in local charts) formulations. The prior comments can be summarized as follows:

  • The extrinsic formulation is readily implemented if the low-dimensional manifold M and an embedding into RN are known. This comes at the possible cost of geometric inexactness and a higher dimension of the co-state and sensitivity equations.

  • The co-state in the intrinsic formulation has a generally lower dimension, which reduces the dimension of the sensitivity equations. The chart-based formulation also guarantees geometrically exact integration of dynamics. This comes at the mild cost of having to define local charts and chart-transitions.

This dimensionality reduction is unlikely to have a high impact when the manifold M is known and low-dimensional, e.g., for the sphere M=S2 or similar manifolds. However, when applying the manifold hypothesis to high-dimensional data, there might be non-trivial latent manifolds for which the embedding is not immediate and where the latent manifold is of a much lower dimension than the embedding data manifold. Then the intrinsic method becomes difficult to avoid. If geometric exactness of the integration is desired, local charts also need to be defined for the extrinsic approach, in which case the intrinsic approach may offer further advantages.

In order to derive neural ODEs on Lie groups, three approaches are possible: the extrinsic and intrinsic formulations on manifolds directly carry over to matrix Lie groups, embedding GGL(m,R) in Rm2 or using local exponential charts, respectively. A third option is a novel intrinsic method for neural ODEs on matrix Lie groups, which makes full use of the Lie group structure by phrasing dynamics on g (as is more common on Lie groups) and the co-state on g*, avoiding difficulties of the chart-based formalism in differentiating extra terms.

Prior comments on advantages and disadvantages of these flavors can be summarized as follows:

  • The extrinsic formulation on matrix Lie groups can come at much higher cost than that on manifolds, since the intrinsic dimension of G can be much lower than m2 and a higher dimension of the co-state and sensitivity equations can be obtained. Geometrically exact integration procedures are more readily available for matrix Lie groups, integrating g˙ in local exponential charts.

  • The chart-based formulation on matrix Lie groups struggles when dynamics are not naturally phrased in local charts. This is common; dynamics are often more naturally phrased on g. This was alleviated by an algebra-based formulation on matrix Lie groups. Both are intrinsic approaches that feature co-state dynamics that are as low as possible. However, the algebra-based approach still lacks readily available software implementation.

The authors believe that the algebra-based formulation is more convenient in principle and consider software implementations of the algebra-based approach as possible future work.

In summary, we presented a unified, geometric approach to extend various methods for neural ODEs on RN to neural ODEs on manifolds and Lie groups. Optimization of neural ODEs on manifolds was based on the adjoint method on manifolds. Given a novel cost function C and neural ODE architecture f, the strategy to present the results in a unified fashion was to identify a suitable augmented manifold Maug, augmented dynamics faugX(Maug), and cost Caug:MaugR such that the original cost function can be rephrased as C=CaugΨfaugT. To further derive optimization of intrinsic neural ODEs on Lie groups, we found a Hamiltonian formulation of the adjoint method on manifolds and subsequently transformed it into Hamiltonian equations on a matrix Lie group.

Appendix A. Hamiltonian Dynamics on Lie Groups

We briefly review Hamiltonian systems on manifolds and matrix Lie groups (see also [21], App. A1).

Given a manifold Q with coordinate maps Qi:QR and pi in the basis dQi on Tq*Q, we define the symplectic form ωΩ2(T*M) as

ω=dpidQi. (A1)

Let YX(T*Q); then a Hamiltonian HC(T*Q,R) implicitly defines a unique vector field XHX(T*Q) by

dH(Y)=ω(XH,Y). (A2)

In coordinates, XH has the components

q˙i=Hpi, (A3)
p˙i=Hqi. (A4)

On a Lie group G, the group structure allows the identification of T*GG×g*G×Rn, e.g., using the pullback Lg*:Tg*Gg* of the left-translation map Lg:GG, and Λ*:gRn, to define PgRn as

Pg=Λ*Lg*P. (A5)

Then the left Hamiltonian HL:G×g*R is defined in terms of H:T*Gg as

HL(g,Pg)=Hg,P). (A6)

For a matrix Lie group, the left Hamiltonian equations read as follows:

g˙=gΛ(HLP), (A7)
P˙=dgLHL+adHLPP, (A8)

with Λ:Rng as in (13) and dgLHRn as in (23).

Author Contributions

Conceptualization, Y.P.W.; methodology, Y.P.W.; software, Y.P.W.; validation, Y.P.W.; formal analysis, Y.P.W.; investigation, Y.P.W.; resources, S.S.; data curation, Y.P.W.; writing—original draft preparation, Y.P.W.; writing—review and editing, Y.P.W., F.C., and S.S.; visualization, Y.P.W.; supervision, F.C. and S.S.; project administration, S.S.; funding acquisition, S.S. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Funding Statement

This research received no external funding.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

  • 1.Chen R.T.Q., Rubanova Y., Bettencourt J., Duvenaud D. Neural Ordinary Differential Equations; Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018); Montreal, QC, Canada. 3–8 December 2018; [(accessed on 13 August 2025)]. pp. 31–60. Available online: http://arxiv.org/abs/1806.07366. [Google Scholar]
  • 2.Massaroli S., Poli M., Park J., Yamashita A., Asama H. Dissecting neural ODEs. [(accessed on 13 August 2025)];Adv. Neural Inf. Process. Syst. 2020 2020:3952–3963. Available online: http://arxiv.org/abs/2002.08071. [Google Scholar]
  • 3.Zakwan M., Natale L.D., Svetozarevic B., Heer P., Jones C., Trecate G.F. Physically Consistent Neural ODEs for Learning Multi-Physics Systems. IFAC-PapersOnLine. 2023;56:5855–5860. doi: 10.1016/j.ifacol.2023.10.079. [DOI] [Google Scholar]
  • 4.Sholokhov A., Liu Y., Mansour H., Nabi S. Physics-informed neural ODE (PINODE): Embedding physics into models using collocation points. Sci. Rep. 2023;13:10166. doi: 10.1038/s41598-023-36799-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ghanem P., Demirkaya A., Imbiriba T., Ramezani A., Danziger Z., Erdogmus D. Learning Physics Informed Neural ODEs with Partial Measurements. [(accessed on 13 August 2025)];Proc. AAAI Conf. Artif. Intell. 2024 AAAI-25 doi: 10.1609/aaai.v39i16.33846. Available online: http://arxiv.org/abs/2412.08681. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Massaroli S., Poli M., Califano F., Park J., Yamashita A., Asama H. Optimal Energy Shaping via Neural Approximators. SIAM J. Appl. Dyn. Syst. 2022;21:2126–2147. doi: 10.1137/21M1414279. [DOI] [Google Scholar]
  • 7.Niu H., Zhou Y., Yan X., Wu J., Shen Y., Yi Z., Hu J. On the applications of neural ordinary differential equations in medical image analysis. Artif. Intell. Rev. 2024;57:236. doi: 10.1007/s10462-024-10894-0. [DOI] [Google Scholar]
  • 8.Oh Y., Kam S., Lee J., Lim D.Y., Kim S., Bui A.A.T. Comprehensive Review of Neural Differential Equations for Time Series Analysis. [(accessed on 13 August 2025)];arXiv. 2025 Available online: http://arxiv.org/abs/2502.09885.2502.09885 [Google Scholar]
  • 9.Poli M., Massaroli S., Scimeca L., Chun S., Oh S.J., Yamashita A., Asama H., Park J., Garg A. Neural Hybrid Automata: Learning Dynamics with Multiple Modes and Stochastic Transitions; Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021); Online. 6–14 December 2021; [(accessed on 13 August 2025)]. pp. 9977–9989. Available online: http://arxiv.org/abs/2106.04165. [Google Scholar]
  • 10.Chen R.T.Q., Amos B., Nickel M. Learning Neural Event Functions for Ordinary Differential Equations; Proceedings of the Ninth International Conference on Learning Representations (ICLR 2021); Virtual. 3–7 May 2021; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/2011.03902. [Google Scholar]
  • 11.Davis J.Q., Choromanski K., Varley J., Lee H., Slotine J.J., Likhosterov V., Weller A., Makadia A., Sindhwani V. Time Dependence in Non-Autonomous Neural ODEs. [(accessed on 13 August 2025)];arXiv. 2020 doi: 10.48550/arXiv.2005.01906. Available online: http://arxiv.org/abs/2005.01906.2005.01906 [DOI] [Google Scholar]
  • 12.Dupont E., Doucet A., Teh Y.W. Augmented Neural ODEs; Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS 2019); Vancouver, BC, Canada. 8–14 December 2019; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/1904.01681. [Google Scholar]
  • 13.Chu H., Miyatake Y., Cui W., Wei S., Furihata D. Structure-Preserving Physics-Informed Neural Networks with Energy or Lyapunov Structure; Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI 24); Jeju, Republic of Korea. 3–9 August 2024; [DOI] [Google Scholar]
  • 14.Kütük M., Yücel H. Energy dissipation preserving physics informed neural network for Allen–Cahn equations. J. Comput. Sci. 2025;87:102577. doi: 10.1016/j.jocs.2025.102577. [DOI] [Google Scholar]
  • 15.Bullo F., Murray R.M. Tracking for fully actuated mechanical systems: A geometric framework. Automatica. 1999;35:17–34. doi: 10.1016/S0005-1098(98)00119-8. [DOI] [Google Scholar]
  • 16.Marsden J.E., Ratiu T.S. Introduction to Mechanics and Symmetry. Volume 17. Springer; New York, NY, USA: 1999. [DOI] [Google Scholar]
  • 17.Whiteley N., Gray A., Rubin-Delanchy P. Statistical exploration of the Manifold Hypothesis. [(accessed on 13 August 2025)];arXiv. 2025 Available online: http://arxiv.org/abs/2208.11665.2208.11665 [Google Scholar]
  • 18.Lou A., Lim D., Katsman I., Huang L., Jiang Q., Lim S.N., De Sa C. Neural Manifold Ordinary Differential Equations; Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020); Online. 6–12 December 2020; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/2006.10254. [Google Scholar]
  • 19.Falorsi L., Forré P. Neural Ordinary Differential Equations on Manifolds. [(accessed on 13 August 2025)];arXiv. 2020 doi: 10.48550/arXiv.2006.06663. Available online: http://arxiv.org/abs/2006.06663.2006.06663 [DOI] [Google Scholar]
  • 20.Duong T., Altawaitan A., Stanley J., Atanasov N. Port-Hamiltonian Neural ODE Networks on Lie Groups for Robot Dynamics Learning and Control. IEEE Trans. Robot. 2024;40:3695–3715. doi: 10.1109/TRO.2024.3428433. [DOI] [Google Scholar]
  • 21.Wotte Y.P., Califano F., Stramigioli S. Optimal potential shaping on SE(3) via neural ordinary differential equations on Lie groups. Int. J. Robot. Res. 2024;43:2221–2244. doi: 10.1177/02783649241256044. [DOI] [Google Scholar]
  • 22.Floryan D., Graham M.D. Data-driven discovery of intrinsic dynamics. Nat. Mach. Intell. 2022;4:1113–1120. doi: 10.1038/s42256-022-00575-4. [DOI] [Google Scholar]
  • 23.Andersdotter E., Persson D., Ohlsson F. Equivariant Manifold Neural ODEs and Differential Invariants. [(accessed on 13 August 2025)];arXiv. 2024 doi: 10.48550/arXiv.2401.14131. Available online: http://arxiv.org/abs/2401.14131.2401.14131 [DOI] [Google Scholar]
  • 24.Wotte Y.P. Master’s Thesis. University of Twente; Enschede, The Netherlands: 2021. Optimal Potential Shaping on SE(3) via Neural Approximators. [Google Scholar]
  • 25.Kidger P. Ph.D. Thesis. Mathematical Institute, University of Oxford; Oxford, UK: 2022. On Neural Differential Equations. [Google Scholar]
  • 26.Gholami A., Keutzer K., Biros G. ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs; Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 19); Macao. 10–16 August 2019; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/1902.10298. [Google Scholar]
  • 27.Kidger P., Morrill J., Foster J., Lyons T.J. Neural Controlled Differential Equations for Irregular Time Series; Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020); Online. 6–12 December 2020; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/2005.08926. [Google Scholar]
  • 28.Li X., Wong T.L., Chen R.T.Q., Duvenaud D. Scalable Gradients for Stochastic Differential Equations; Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020); Online. 26–28 August 2020; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/2001.01328. [Google Scholar]
  • 29.Liu Y., Cheng J., Zhao H., Xu T., Zhao P., Tsung F., Li J., Rong Y. SEGNO: Generalizing Equivariant Graph Neural Networks with Physical Inductive Biases; Proceedings of the 12th International Conference on Learning Representations (ICLR 2024); Vienna, Austria. 7–11 May 2024; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/2308.13212. [Google Scholar]
  • 30.Greydanus S., Dzamba M., Yosinski J. Hamiltonian Neural Networks; Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019); Vancouver, BC, Canada. 8–14 December 2019; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/1906.01563. [Google Scholar]
  • 31.Finzi M., Wang K.A., Wilson A.G. Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints; Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020); Online. 6–12 December 2020; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/2010.13581. [Google Scholar]
  • 32.Cranmer M., Greydanus S., Hoyer S., Research G., Battaglia P., Spergel D., Ho S. Lagrangian Neural Networks; Proceedings of the ICLR 2020 Deep Differential Equations Workshop; Addis Ababa, Ethiopia. 26 April 2020; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/2003.04630. [Google Scholar]
  • 33.Bhattoo R., Ranu S., Krishnan N.M. Learning the Dynamics of Particle-based Systems with Lagrangian Graph Neural Networks. Mach. Learn. Sci. Technol. 2023;4:015003. doi: 10.1088/2632-2153/acb03e. [DOI] [Google Scholar]
  • 34.Xiao S., Zhang J., Tang Y. Generalized Lagrangian Neural Networks. [(accessed on 13 August 2025)];arXiv. 2024 doi: 10.48550/arXiv.2401.03728. Available online: http://arxiv.org/abs/2401.03728.2401.03728 [DOI] [Google Scholar]
  • 35.Rettberg J., Kneifl J., Herb J., Buchfink P., Fehr J., Haasdonk B. Data-Driven Identification of Latent Port-Hamiltonian Systems. [(accessed on 13 August 2025)];arXiv. 2024 :37–99. Available online: http://arxiv.org/abs/2408.08185. [Google Scholar]
  • 36.Duong T., Atanasov N. Hamiltonian-based Neural ODE Networks on the SE(3) Manifold For Dynamics Learning and Control; Proceedings of the Robotics: Science and Systems (RSS 2021); Online. 12–16 July 2021; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/2106.12782v3. [Google Scholar]
  • 37.Fronk C., Petzold L. Training stiff neural ordinary differential equations with explicit exponential integration methods. Chaos. 2025;35:33154. doi: 10.1063/5.0251475. [DOI] [PubMed] [Google Scholar]
  • 38.Kloberdanz E., Le W. Artificial Neural Networks and Machine Learning—ICANN 2023. Volume 14262. Springer; Cham, Switzerland: 2023. S-SOLVER: Numerically Stable Adaptive Step Size Solver for Neural ODEs; pp. 388–400. (Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). [DOI] [Google Scholar]
  • 39.Akhtar S.W. On Tuning Neural ODE for Stability, Consistency and Faster Convergence. SN Comput. Sci. 2025;6:318. doi: 10.1007/s42979-025-03832-6. [DOI] [Google Scholar]
  • 40.Zhu A., Jin P., Zhu B., Tang Y. On Numerical Integration in Neural Ordinary Differential Equations; Proceedings of the 39th International Conference on Machine Learning (ICML 2022); Baltimore, MD, USA. 17–23 July 2022; Baltimore, MD, USA: ML Research Press; 2022. [(accessed on 13 August 2025)]. pp. 27527–27547. Available online: http://arxiv.org/abs/2206.07335. [Google Scholar]
  • 41.Munthe-Kaas H. High order Runge-Kutta methods on manifolds. Appl. Numer. Math. 1999;29:115–127. doi: 10.1016/S0168-9274(98)00030-0. [DOI] [Google Scholar]
  • 42.Celledoni E., Marthinsen H., Owren B. An introduction to Lie group integrators—Basics, New Developments and Applications. J. Comput. Phys. 2014;257:1040–1061. doi: 10.1016/j.jcp.2012.12.031. [DOI] [Google Scholar]
  • 43.Ma Y., Dixit V., Innes M.J., Guo X., Rackauckas C. A Comparison of Automatic Differentiation and Continuous Sensitivity Analysis for Derivatives of Differential Equation Solutions; Proceedings of the 2021 IEEE High Performance Extreme Computing Conference (HPEC 2021); Online. 20–24 September 2021; [DOI] [Google Scholar]
  • 44.Saemundsson S., Terenin A., Hofmann K., Deisenroth M.P. Variational Integrator Networks for Physically Structured Embeddings; Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS 2020); Online. 26–28 August 2020; [(accessed on 13 August 2025)]. pp. 3078–3087. Available online: http://arxiv.org/abs/1910.09349. [Google Scholar]
  • 45.Desai S.A., Mattheakis M., Roberts S.J. Variational integrator graph networks for learning energy-conserving dynamical systems. Phys. Rev. E. 2021;104:035310. doi: 10.1103/PhysRevE.104.035310. [DOI] [PubMed] [Google Scholar]
  • 46.Bobenko A.I., Suris Y.B. Mathematical Physics Discrete Time Lagrangian Mechanics on Lie Groups, with an Application to the Lagrange Top. Commun. Math. Phys. 1999;204:147–188. doi: 10.1007/s002200050642. [DOI] [Google Scholar]
  • 47.Marsden J.E., Pekarsky S., Shkoller S., West M. Variational Methods, Multisymplectic Geometry and Continuum Mechanics. J. Geom. Phys. 2001;38:253–284. doi: 10.1016/S0393-0440(00)00066-8. [DOI] [Google Scholar]
  • 48.Duruisseaux V., Duong T., Leok M., Atanasov N. Lie Group Forced Variational Integrator Networks for Learning and Control of Robot Systems; Proceedings of the 5th Annual Conference on Learning for Dynamics and Control; Philadelphia, PA, USA. 15–16 June 2023; [(accessed on 13 August 2025)]. pp. 1–21. Available online: http://arxiv.org/abs/2211.16006. [Google Scholar]
  • 49.Lee J.M. Introduction to Smooth Manifolds. 2nd ed. Springer; New York, NY, USA: 2012. [DOI] [Google Scholar]
  • 50.Hall B.C. Lie Groups, Lie Algebras, and Representations: An Elementary Introduction. Volume 222. Springer; Berlin/Heidelberg, Germany: 2015. Graduate Texts in Mathematics (GTM) [DOI] [Google Scholar]
  • 51.Solà J., Deray J., Atchuthan D. A micro Lie theory for state estimation in robotics. [(accessed on 13 August 2025)];arXiv. 2021 doi: 10.48550/arXiv.1812.01537. Available online: http://arxiv.org/abs/1812.01537.1812.01537 [DOI] [Google Scholar]
  • 52.Visser M., Stramigioli S., Heemskerk C. Cayley-Hamilton for roboticists. IEEE Int. Conf. Intell. Robot. Syst. 2006;1:4187–4192. doi: 10.1109/IROS.2006.281911. [DOI] [Google Scholar]
  • 53.Robbins H., Monro S. A Stochastic Approximation Method. Ann. Math. Stat. 1951;22:400–407. doi: 10.1214/aoms/1177729586. [DOI] [Google Scholar]
  • 54.Falorsi L., de Haan P., Davidson T.R., Forré P. Reparameterizing Distributions on Lie Groups; Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS 2019); Naha, Okinawa, Japan. 16–18 April 2019; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/1903.02958. [Google Scholar]
  • 55.White A., Kilbertus N., Gelbrecht M., Boers N. Stabilized Neural Differential Equations for Learning Dynamics with Explicit Constraints; Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023); New Orleans, LA, USA. 10–16 December 2023; [(accessed on 13 August 2025)]. Available online: http://arxiv.org/abs/2306.09739. [Google Scholar]
  • 56.Poli M., Massaroli S., Yamashita A., Asama H., Park J. TorchDyn: A Neural Differential Equations Library. [(accessed on 13 August 2025)];arXiv. 2020 doi: 10.48550/arXiv.2009.09346. Available online: http://arxiv.org/abs/2009.09346.2009.09346 [DOI] [Google Scholar]
  • 57.Schaft A.V.D., Jeltsema D. Port-Hamiltonian Systems Theory: An Introductory Overview. Volume 1. Now Publishers Inc.; Hanover, MA, USA: 2014. pp. 173–378. [DOI] [Google Scholar]
  • 58.Rashad R., Califano F., van der Schaft A.J., Stramigioli S. Twenty years of distributed port-Hamiltonian systems: A literature review. IMA J. Math. Control Inf. 2020;37:1400–1422. doi: 10.1093/imamci/dnaa018. [DOI] [Google Scholar]
  • 59.Wotte Y.P., Dummer S., Botteghi N., Brune C., Stramigioli S., Califano F. Discovering efficient periodic behaviors in mechanical systems via neural approximators. Optim. Control Appl. Methods. 2023;44:3052–3079. doi: 10.1002/oca.3025. [DOI] [Google Scholar]
  • 60.Yao W. A Singularity-Free Guiding Vector Field for Robot Navigation. Springer; Cham, Switzerland: 2023. pp. 159–190. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.


Articles from Entropy are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES