Annealed fractional Lévy–Itō diffusion models for protein generation

Eric Paquet; Farzan Soleymani; Herna Lydia Viktor; Wojtek Michalowski

doi:10.1016/j.csbj.2024.04.009

. 2024 Apr 17;23:1641–1653. doi: 10.1016/j.csbj.2024.04.009

Annealed fractional Lévy–Itō diffusion models for protein generation

Eric Paquet ^a,^b,^⁎, Farzan Soleymani ^c, Herna Lydia Viktor ^b, Wojtek Michalowski ^c

PMCID: PMC11047197 PMID: 38680869

Abstract

Protein generation has numerous applications in designing therapeutic antibodies and creating new drugs. Still, it is a demanding task due to the inherent complexities of protein structures and the limitations of current generative models. Proteins possess intricate geometry, and sampling their conformational space is challenging due to its high dimensionality. This paper introduces novel Markovian and non-Markovian generative diffusion models based on fractional stochastic differential equations and the Lévy distribution, allowing for a more effective exploration of the conformational space. The approach is applied to a dataset of $40, 000$ proteins and evaluated in terms of Fréchet distance, fidelity, and diversity, outperforming the state-of-the-art by 25.4%, 35.8%, and 11.8%, respectively.

Keywords: Diffusion, Fractional, Generative model, Lévy–Itō, Noising, Protein, Stable distribution, Stochastic differential equation, Score

Graphical abstract

1. Introduction

Proteins perform numerous cellular functions such as enzymatic activity, structural support, transport and storage, signalling, regulation of gene expression, immune response, and catalysis of biochemical reactions [1]. They consist of linear chains of amino acids. These sequences form three-dimensional structures known as conformations, which result from the interactions between the amino acids and their environment [2]. These conformations determine, in turn, the functionality of proteins [3]. Protein generative models hold immense potential in various fields, from medicine to materials science. Generative models can be employed to design new therapeutic proteins [4] and understand disease mechanisms and protein dysfunctions [5]. They may accelerate research through cost-effective rapid prototyping [6] and may be customised for specific functions [7] in addition to being able to generate proteins that do not exist in nature. In materials science, generated proteins may be employed to create new biomaterials and to form nanoscale structures [8].

Nonetheless, generating proteins is a challenging problem due to the complexity of their geometry [9], the limited amount of structural data for specific proteins, the conformational variability, and importantly, the high dimensionality of their conformational space, which makes sampling a daunting task [10]. This paper proposes novel Markovian and non-Markovian diffusion probabilistic models based on fractional stochastic differential equations (SDEs) and the Lévy distribution [11], which allow for a more effective exploration of the conformational space and more accurate generated results. The proposed approach replaces the Wiener process in the forward noising equation with a Lévy process. The Lévy distribution is heavy-tailed and, unlike the Gaussian distribution, entails large fluctuations. The process may be reversed with a backward denoising fractional SDE [12], which involves fractional derivatives. Instead of employing only one distribution for noising, a family of Lévy distributions is used. The Lévy distribution is annealed from large to small fluctuations (i.e. from a heavy-tailed distribution to a Gaussian distribution), in a process reminiscent of simulated annealing, to improve the convergence of the calculations. Fifteen Lévy–Itō models are evaluated against five non-fractional state-of-the-art models on a dataset of 40,000 proteins with three metrics: the Fréchet distance [13], the density, and the coverage [14].

The paper is organised as follows. Diffusion processes, SDEs, and score-matching techniques are introduced in Section 2.1. The solution of the backward equation with the exponential integrator method is presented in Section 2.2. Lévy–Itō diffusion models are introduced in Section 3, while the fractional Riesz derivative approximation and the stability index annealing are addressed in Section 4. The representation of proteins is discussed in Section 5. The implementation and the methodology appear in Section 6, and the experimental results and their discussion are reported in Section 7. Section 8 concludes the paper. All the mathematical symbols appearing in this article are defined in Table A.4.

2. Background

Score-based probabilistic diffusion models are a subclass of diffusion models that employ a score function to guide the generation process. This score function estimates the gradient of the log probability of the data, providing a way to progressively denoise a sample from a random distribution to a data-like distribution [15], [16]. The relevance of these models to protein generation and biological systems is manifold. Score-based models can generate high-quality samples of protein structures by learning the distribution of existing protein data. They can be used to predict the structure of proteins or to design new proteins with desired functions [17]. The iterative refinement process employed in these models mimics the natural folding of proteins, potentially providing insights into how amino acid sequences determine their 3-D structure [18]. The mathematical framework of score-based diffusion models, often involving stochastic differential equations (SDEs), captures the complex, high-dimensional relationships inherent in biological data. This may be crucial for understanding multifaceted biological systems in which interactions at the molecular level influence macroscopic behaviour. Protein space is vast and largely unexplored. Score-based diffusion models can efficiently explore this space by generating novel protein sequences and structures, facilitating the discovery of proteins with unique or enhanced functionalities for medical and industrial applications [19], [20].

The underlying SDEs in score-based diffusion models are similar to the dynamic processes observed in biological systems, in which deterministic and stochastic factors influence changes over time. This similarity may make these models particularly useful for simulating and understanding the dynamics of biological processes, such as enzymatic reactions, cellular signalling pathways or evolutionary changes [21]. Biological systems are often noisy and subject to multiple sources of uncertainty. Score-based diffusion models naturally incorporate noise and uncertainty, which may be advantageous for accurate modelling of biological phenomena that are inherently stochastic [16], [22], [23]. These models can integrate different types of biological data (e.g., genomic, proteomic, metabolomic) to generate comprehensive models of biological systems. This integrative approach may lead to a deeper understanding of the underlying mechanisms of complex biological processes and diseases. In summary, score-based probabilistic diffusion models, with their robust mathematical foundation and ability to model complex, noisy and high-dimensional data, have significant potential to advance our understanding and capabilities in protein generation and broader biological systems analysis. Their ability to generate new, plausible patterns from learned data distributions makes them a powerful tool for innovation in synthetic biology, drug design and systems biology.

2.1. Diffusion processes, SDEs, and score-matching techniques

A diffusion model consists of a fixed forward noising process that adds noise to the data and a learned backward denoising process that iteratively removes noise from them. The denoising process is trained to match the corresponding noising process at each iteration. Samples from the data probability distribution may be generated from random noise by simulating the backward diffusion process [15], [24]. The forward noising process may be represented by an SDE such as

d x_{t} = F_{t} x d t + G_{t} d w, x \in R^{D}, F \in R^{D \times D}, G \in R^{D \times D}, w \in R^{D}

(1)

where $F_{t}$ is the drift matrix, $G_{t}$ is the diffusion matrix, and w is a standard Wiener process [25]. This equation is linear in both the drift and the diffusion and corresponds to the noising process in which noise is gradually added to the data until a white noise is obtained. As demonstrated by [26], [27], the reverse-time diffusion process has a closed-form solution:

d x_{t} = [F_{t} x - G_{t} G_{t}^{T} \nabla_{x} \log p_{t} (x)] d t + G_{t} d \bar{w}

(2)

where $p_{t} (x)$ is the data probability distribution, $\nabla_{x} \log p_{t} (x)$ is its gradient, known as the score function, and $\bar{w}$ denotes a standard Wiener process in the reverse-time direction. This equation corresponds to the denoising process in which noise is progressively removed until a new datum, distributed according to $p_{t} (x)$ , is generated. The score function is unknown [15], [24] and must be approximated with a score-matching technique:

E_{p (x_{t})} [\frac{1}{2} {‖ s_{θ} (x_{t}, t) - \nabla_{x_{t}} \log p (x_{t}) ‖}_{Λ_{t}}^{2}]

(3)

where $s_{θ} (x_{t}; t)$ is a time-dependent deep neural network called the score network, represents the network parameters, $E$ is the mathematical expectation, and $Λ_{t}$ is a metric (a symmetric matrix). The mathematical expectation is defined as

E_{p (x)} f (x) \overset{\land}{=} \int p (x) f (x) d x

(4)

while the metric is associated with a quadratic form:

{‖ \nabla_{x_{t}} \log p_{0 t} (x_{t}) - s_{θ} (x_{t}, t) ‖}_{Λ_{t}}^{2} = [\nabla_{x_{t}} \log p_{0 t} (x_{t})] Λ_{t} s_{θ} (x_{t}, t)

(5)

The metric is determined by the data and the nature of the problem and is often approximated with the identity matrix.

Unfortunately, $\nabla_{x} \log p_{t} (x)$ does not have a closed-form solution and is intractable. Nonetheless, as demonstrated by [27], [28], the score function may be evaluated by employing the gradient of the noising distribution $\nabla_{x_{t}} \log p (x_{t} {| x}_{0})$ :

E_{p (x_{t})} [\frac{1}{2} {‖ s_{θ} (x_{t}, t) - \nabla_{x_{t}} \log p (x_{t}) ‖}_{Λ_{t}}^{2}] = E_{p (x_{0}) p_{0 t} (x_{t} | x_{0})} [\frac{1}{2} {‖ s_{θ} (x_{t}, t) - \nabla_{x_{t}} \log p (x_{t} {| x}_{0}) ‖}_{Λ_{t}}^{2}] + Ω

(6)

where Ω is a constant, and $p (x_{t} {| x}_{0})$ is the probability, at time t, of having a noisy datum $x_{t}$ given an uncorrupted datum $x_{0}$ . As opposed to the score function, the gradient of the noising distribution has a fixed closed form, which is often chosen to be Gaussian:

p_{0 t} (x_{t} | x_{0}) = N (μ_{t} x_{0}, Σ_{t}), μ_{t}, Σ_{t} \in R^{D \times D}, L_{t} L_{t}^{T} = Σ_{t}

(7)

with mean $μ_{t}$ and covariance matrix $Σ_{t}$ . It is convenient to express the covariance matrix in terms of its Cholesky decomposition, where $L_{t}$ is the Cholesky matrix [24]. This factorisation is the product of a lower triangular matrix and its transpose. The parameters of the score network may be learned by minimising the denoising score matching loss [29], [30] with stochastic gradient descent optimisation techniques employing mini-batches [31], such as the adaptive moment estimation method (Adam) [31]:

\begin{matrix} θ^{⁎} = \underset{θ}{\arg \min} L (θ) \\ L (θ) = E_{t \sim U [0, T]} Λ (t) E_{p (x_{0}) p_{0 t} (x_{t} | x_{0})} [{‖ \nabla_{x_{t}} \log p_{0 t} (x_{t} | x_{0}) - s_{θ} (x_{t}, t) ‖}_{Λ_{t}}^{2}], Λ (t) \in R^{+} \end{matrix}

(8)

where $L (θ)$ is the loss function, $U$ is the uniform distribution, and $Λ (t)$ is a positive weighting function. It is important to note that the time must be sampled randomly (from a uniform distribution) to avoid bias when training the score network. Therefore, ${x_{t}} | t \in [0, T]$ must be time-encoded for the optimisation algorithm to identify the time correctly [32]. The time-dependent weight $Λ (t)$ [33], [27] determines the importance assigned to each time step and depends on the nature of the problem and the data. It may also be used as a forgetting mechanism.

The score tends to vary rapidly. As pointed out by [30], parametrising the score network in terms of the learned noise , that is,

(9)

may significantly improve the accuracy of the learning process. Therefore, when expressed in terms of the learned noise, the training loss function becomes:

(10)

This equation is employed to determine the parameters of the noise network .

2.2. Solution of the backward equation with the exponential integrator method

Once the score network has been learned, it may be substituted in the backward diffusion equation in place of the score function:

(11)

A parameter η is introduced here to further generalise the backward equation to SDEs and ordinary differential equations (ODEs): it is equal to 1 for SDEs and 0 for ODEs. This equation may be solved with an explicit numerical method, such as the Euler method [34]. Unfortunately, this approach results in low accuracy and is unstable when the step size is insufficiently small [27], [35]. Therefore, [24] proposed solving Eq. (11) with an exponential integrator (EI) to take advantage of the semi-linear structure of the reverse process, a technique known as diffusion exponential integrator sampler (DEIS):

(12)

where Δt is a small time interval, ${\hat{x}}_{t}$ is an estimate of $x_{t}$ , and the transition matrix Ψ is given by

\frac{\partial Ψ (t - Δ t, t)}{\partial t} = F_{t} Ψ (t - Δ t, t), Ψ (t, t) = I

(13)

This solves the backward equation exactly if the noise is constant over the time interval $[t - Δ t, t]$ [24]. The backward equation does not need to be trained for each value of η. Indeed, a Fokker–Planck–Kolmogorov (FPK) equation may be associated with the backward equation [27]:

d {\hat{x}}_{t} = [F_{t} \hat{x} - \frac{1 + η^{2}}{2} G_{t} G_{t}^{T} \nabla_{x_{t}} p_{t} (x)] d t + η G_{t} d \bar{w} \Rightarrow \frac{\partial p_{t} (x)}{\partial t} = - \nabla \cdot {[F_{t} x - \frac{1}{2} G_{t} G_{t}^{T} \nabla_{x} p_{t} (x)] p_{t} (x)} \Rightarrow \frac{\partial p_{t} (x)}{\partial η} = 0

(14)

This equation does not depend on η and, as demonstrated by [27], the score function is the same regardless of its value. As a result, the score network is trained only once with $η = 1$ . The parameter η is only employed in the generative process to parametrise the various models. When $η = 1$ , the model corresponds to the well-known Markovian denoising diffusion probabilistic model (DDPM), and when $η = 0$ , it corresponds to the non-Markovian denoising diffusion implicit model (DDIM) [36]. These models do not result from an ad hoc procedure but follow naturally from the SDEs and the numerical method (DEIS) employed for solving them. In this work, the variance-preserving SDE (VPSDE) parametrisation is chosen for the drift matrix, the diffusion matrix, and the mean and the covariance of the noising distribution [30], as reported in Table 1.

Table 1.

VPSDE parametrisation.

F_t	G_t	μ_t	Σ_t
$\frac{1}{2} \frac{d \log β_{t}}{d t} I$	$\sqrt{- \frac{d \log β_{t}}{d t}} I$	$\sqrt{β_{t}} I$	$(1 - β_{t}) I$

Open in a new tab

In this work, the noising parameter is scheduled (parametrised) according to

β_{t} = β_{\min} + \frac{t}{T} (β_{\max} - β_{\min})

(15)

with $β_{\min} = β_{0} = 0$ and $β_{\max} = β_{T} = 1$ [30]. With this parametrisation, the transition matrix becomes

Ψ (t - Δ t, t) = \sqrt{\frac{β_{t - Δ t}}{β_{t}}}

(16)

while the solution of the backward equation with DEIS is given by

(17)

This equation implies that the solution is normally distributed with mean $μ_{η}$ and variance $σ_{η}^{2}$ :

(18)

the latter result being obtained with the reparametrisation trick [37]. The following section extends the previous results to a non-Gaussian distribution, namely the Lévy or stable distribution.

3. Lévy–Itō diffusion models

The Lévy or stable distribution [11] is an extreme distribution [38] with a heavy tail. Unlike the Gaussian distribution, the suppression of large fluctuations is polynomial and not exponential [11], meaning that very large values are most likely to occur:

p_{L} (x; α, γ, μ, σ) \sim {\frac{C_{\pm}}{{| x |}^{1 + α}} |}_{x \to \pm \infty}

(19)

where $C_{\pm}$ are constants and $α \in] 0, 2]$ is the stability index. The stable distribution does not have a closed form, but it may be expressed in terms of the Fourier transform of its characteristic function $φ (k; α, γ, μ, σ)$ :

p_{L} (x; α, γ, μ, σ) = \int_{- \infty}^{\infty} φ (k; α, γ, μ, σ) \exp [- 2 π i k x] d k, i = \sqrt{- 1} : φ (k; α, γ, μ, σ) = {\begin{matrix} \exp [i k μ - | k σ | (1 + 2 i \frac{γ}{π} sgn (k) \log | k σ |)] & \Leftrightarrow α = 1 \\ \exp [i k μ - {| k σ |}^{α} (1 + i γ \tan (\frac{π α}{2}) sgn (k) ({| k σ |}^{1 - α} - 1))] & \Leftrightarrow α \neq 1 \end{matrix}

(20)

where $γ \in [- 1, 1]$ is the skewness parameter, $μ \in R$ is the location parameter, and $σ \in R^{+}$ is the scale parameter. Stable distributions with large stability indices have heavier tails and random walks characterised by larger steps. The skewness determines the distribution's degree of asymmetry (only symmetrical distributions are considered in this work, i.e. $γ = 0$ ). The location parameter specifies the location of the distribution's maximum, while the scale distribution determines its spread. The following alternative notation is also employed for the distribution:

S α S (σ) \equiv p_{L} (x; α, 0, 0, σ)

(21)

with ${S α S (σ) |}_{α = 2} = N (0, \sqrt{2} σ)$ corresponding to the Gaussian distribution. Therefore, the Lévy distribution is a generalisation of the normal distribution.

The Lévy distribution is illustrated for various values of the stability index in Fig. 1, and the corresponding heavy tails are reported in Fig. 2. From a random walk perspective, the Lévy distribution allows transitions much larger than its Gaussian counterpart's [38]. These large steps are required to explore the solution space in its entirety, which may not be readily accessible to small Gaussian steps [38]. This phenomenon is illustrated in Fig. 3, which shows random walks of a hundred steps for various values of the stability index, demonstrating that smaller values of the index favour further exploration of the solution space.

Fig. 3 — Lévy random walks corresponding to the distributions shown in Fig. 1

The forward SDE, as described by Eq. (1), may be generalised to include Lévy processes [39], [11]:

d x_{t} = F_{t} x d t + G_{t} d w + H_{t} d L_{α}

(22)

where $L_{α} = [d L_{α, 1}, d L_{α, 2}, \dots, d L_{α, D}]$ is a Lévy process with stability index α, and $H_{t} \in R^{D \times D}$ is known as the fractional diffusion matrix. It should be noted that a one-dimensional Lévy distribution is associated with each dimension. Such a model is known as a fractional diffusion model (FDM) or a Lévy–Itō model (LIM).

As demonstrated by [40], the reverse-time fractional diffusion denoising process associated with this equation has a closed-form solution, given by

d x_{t} = [F_{t} x - G_{t} G_{t}^{T} \nabla_{x} \log p_{t} (x)] d t + G_{t} d \bar{w} - α H_{t}^{α} \frac{\partial_{| x_{t} |}^{α - 2} \nabla_{x_{t}} p_{t} (x_{t})}{p_{t} (x_{t})} + H_{t} d {\bar{L}}_{α}

(23)

where $\partial_{| x_{t} |}^{α - 2}$ is the fractional Riesz derivative of order $α - 2$ , and ${\bar{L}}_{α}$ is a Lévy process in the reverse-time direction. The fractional exponentiation of $H_{t}$ is evaluated using its spectral decomposition [41]:

H_{t}^{α} = U [\begin{matrix} λ_{1}^{α} \\ λ_{2}^{α} \\ ⋱ \\ λ_{D}^{α} \end{matrix}] U^{T}

(24)

where U is the eigenvector matrix associated with H and ${Λ_{i}}_{i = 1}^{D}$ are the corresponding eigenvalues. The fractional Riesz derivative, which is a generalisation of the standard derivative, may be evaluated based on the properties of the Fourier transform [42], [43]. Indeed, it is well known from Fourier analysis [44] that the fractional derivative may be obtained by raising the Fourier frequencies by the fractional exponent and then taking the inverse Fourier transform:

\begin{matrix} \partial_{| x |}^{α} [f_{1} (x), f_{2} (x), \dots, f_{D} (x)] \equiv [\partial_{| x_{1} |}^{α} f_{1} (x), \partial_{| x_{2} |}^{α} f_{2} (x), \dots, \partial_{| x_{D} |}^{α} f_{D} (x)] \\ \partial_{| x_{i} |}^{α} f = F^{- 1} ({| k_{i} |}^{α} F [f] (k)) \end{matrix}

(25)

Here, $F$ is the Fourier transform and $F^{- 1}$ is the corresponding inverse transform:

\begin{matrix} F [f] (k) = \int d^{D} x \exp (- 2 π i k \cdot x) f (x) \\ F^{- 1} [f] (x) = \int d^{D} k \exp (2 π i k \cdot x) f (k) \end{matrix}

(26)

where $i = \sqrt{- 1}$ is the imaginary unit. Unlike the Wiener process, the backward equation involves not only the score function but also the data distribution and the fractional Riesz derivative of the score function. In this work, only Lévy-driven SDEs are considered:

d x_{t} = F_{t} x d t + H_{t} d L_{α}

(27)

As for the Wiener process, a fractional Fokker–Planck equation (FFPE) [40], [38] may be associated with the forward (noising) process:

d x_{t} = F_{t} x d t + H_{t} d L_{α} \Rightarrow \frac{\partial p_{t} (x)}{\partial t} = - \nabla \cdot [F_{t} x - H_{t}^{α} q (t, x) p_{t} (x)], q_{i} (t, x) = \frac{\partial_{| x_{i} |}^{α - 2} \partial_{x_{i}} p_{t} (x)}{p_{t} (x)}

(28)

where $q (t, x) = [q_{i} (t, x)]$ .

4. Approximation of the fractional Riesz derivative and annealing of the stability index

The backward fractional equation has a high computational complexity, which may impede the training of the score network. More importantly, unlike its gradient, the noising distribution $p_{0 t} (x_{t} | x_{0})$ cannot replace the data distribution $p_{t} (x)$ in the score-matching technique. For this reason, and because the solution of the backward equation is an incremental process, the fractional Riesz derivative is expressed as a truncated series expansion. As proposed by [42], the fractional Riesz derivative may be expanded as

\frac{\partial_{| x_{i} |}^{α - 2} \partial_{x_{i}} p_{t} (x)}{p_{t} (x)} \approx \frac{1}{h^{α - 2}} \sum_{k \in Z} \frac{{(- 1)}^{k} Γ (α - 1)}{Γ (\frac{α}{2} - k) Γ (\frac{α}{2} + k)} \partial_{x_{i}} p_{t} (x_{1}, x_{2}, \dots, x_{i} - k h, \dots x_{d}) [1 - k h] \partial_{x_{i}} p_{t} (x)

(29)

where $h \in R^{+}$ is a parameter. If the expansion is truncated to first order, one obtains

\frac{\partial_{| x_{i} |}^{α - 2} \partial_{x_{i}} p_{t} (x)}{p_{t} (x)} \approx \frac{1}{h^{α - 2}} \frac{Γ (α - 1)}{Γ {(\frac{α}{2})}^{2}} \partial_{x_{i}} \log p_{t} (x)

(30)

where $Γ (z) = \int_{0}^{\infty} t^{z - 1} \exp (- t) d t$ is the gamma function. From this approximation of the fractional derivative, by employing the VPSDE parametrisation with $H_{t} = G_{t}$ and solving the backward equation with the DEIS method, one obtains

(31)

where

S α_{t} S (I) \overset{\land}{=} [S α_{t} S (1), S α_{t} S (1), \dots, S α_{t} S (1)] \in R^{D}

(32)

This equation is isomorphic to its non-fractional counterpart, with the Wiener process being replaced by the Lévy process, and with the introduction of a fractional exponent $(1 / α_{t})$ for the noising parameter $β_{t}$ . It should be noted that the stability index is now a function of time. As for Eq. (11), the parameter η may be introduced, thus encompassing both Markovian and non-Markovian models. The loss function is similar to its non-fractional (Wiener process) counterpart, with the important distinction that the noise is distributed according to a stable distribution instead of a Gaussian one:

(33)

where σ is the scale parameter of the Lévy distribution (not to be confused with the standard deviation of the Gaussian distribution). Unfortunately, it is challenging to denoise the large divergent noise generated by the heavy tail [40]. This is in contrast with the Gaussian noise, which does not diverge. However, these large steps are required to explore the solution space fully, which may not be readily achievable with small Gaussian steps [38]. Therefore, with the aim of maximising benefits and minimising drawbacks, the stability index is annealed from a low value (large fluctuations) to $α = 2$ (small fluctuations, Gaussian distribution):

α_{t} = α_{\min} + \frac{t}{T} (α_{\max} - α_{\min})

(34)

with $1 \leq α_{\min} < 2$ and $α_{\max} = 2$ . Values less than one are not employed because the mean and the variance become infinite [45]. Invariant representations of proteins are addressed in the next section (also see Fig. 4 for the graphical summary of the proposed approach).

Fig. 4 — Graphical summary of the proposed approach.

5. Protein backbone representation

Proteins' tertiary structures, which include their complex three-dimensional arrangements, are influenced by the interactions of their amino acid side chains along with the backbone. For instance, Fig. 5 shows the backbone of entry 10GS of the Protein Data Bank (PDB), a database of three-dimensional structural data for large biological molecules, such as proteins and nucleic acids [46]. The backbone corresponds to the folding of the protein's amino acid sequence [47], [48]. Each amino acid sequence is oriented, begins with the N-terminus, and ends with the C-terminus [47], [48]. The folding results from interactions between amino acids and their interaction with the surrounding environment, which is essentially aqueous [47]. The resulting structure is called the native state [47], [48]. Proteins' backbones may have any arbitrary orientation. The neural network must learn proteins' backbones irrespective of orientation to generate new proteins. There are essentially two ways to achieve this objective: by employing equivariant neural networks (ENNs) [49] or a rotation- and translation-invariant representation [50], [51]. ENNs have an architecture that implements an equivariant map in which the same symmetry group acts on the domain and codomain. For the noise network , this means that if the coordinates are rotated and translated, the generated noise follows the same transformation:

(35)

where R is the rotation matrix and t is the translation. This is in contrast with an invariant map in which

(36)

This is the approach that has been adopted in this work. The three-dimensional structure of a backbone may be characterised by the position or coordinates of its α-carbons [47], [48], that is, the first carbon atom of each amino acid forming the sequence [47], [48]. The α-carbons may be ordered because, as stated earlier, their corresponding amino acid sequence is oriented. An invariant representation is obtained by evaluating the Euclidean distance between each pair of α-carbons [50] (Fig. 6 shows the distance matrix of protein 10GS). The distance matrix is defined as

\begin{matrix} D = [d_{i j}] \\ d_{i j} = {‖ a_{i} - a_{j} ‖}_{2} \end{matrix}

(37)

where ${a_{i}}_{i = 1}^{D}$ are the Cartesian coordinates of the α-carbons. This representation is unique because of the amino acid sequence ordering. The invariance comes from the Euclidean distance, which is invariant to translation and rotation. The coordinates of the α-carbons may be retrieved from the distance matrix with various techniques [51], among which the alternative direction method of multipliers (ADMM) is one of the most accurate [52]. This method is a combination of dual ascent with decomposition and the method of multipliers:

\begin{matrix} G_{k + 1}, η_{k + 1} = \underset{G, η}{\arg \min} ξ {‖ η ‖}_{1} + \frac{1}{2} (\sum_{i = 1, j = 1}^{m} {(g_{i i} + g_{j j} - 2 g_{i j} + η_{i j} - d_{i j}^{2})}^{2}) + \frac{χ}{2} {‖ G - Z_{k} + U_{k} ‖}_{2}^{2} \\ Z_{k + 1} = Π_{S_{+}^{n}} (G_{k + 1} + U_{k}) \\ U_{k + 1} = U_{k} + G_{k + 1} - Z_{k + 1} \end{matrix}

(38)

where $G = [g_{i j}] = [〈 a_{i}, a_{j} 〉]$ is the Gram matrix (containing the inner product of each pair of α-carbon coordinates), η is a slack matrix, $χ > 0$ is an augmented Lagrangian penalty, $ξ \in R$ , $Π_{S_{+}^{n}}$ is a projection in the symmetric positive semi-definite matrix space, and U is an auxiliary matrix. After convergence, the coordinates of the α-carbons may be obtained with the singular value decomposition factorisation technique [52].

Fig. 5 — Backbone of protein 10GS from the PDB

Fig. 6 — Distance matrix of protein 10GS from the PDB (first 120 amino acids).

6. Implementation and methodology

The noise network is implemented with a standard U-Net [53] in which, to improve performance, the original ReLU activation function is substituted for a Swish function with hyperparameter Λ [54]:

τ (x) = \frac{x}{1 + \exp (- λ x)}

(39)

The network comprises a contracting and an expansive path [53]. The contracting path involves multiple convolution kernels, Swish functions, and a maximum pooling for downsampling. At each downsampling step, the number of feature channels is doubled. The expansive path consists of an upsampling, multiple convolutions that halve the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and Swish functions. The network also implements position embedding [32] for time encoding, as described by Eq. (8) below. The neural network architecture is outlined in Fig. 7, with the full details shown in Fig. 8.

Fig. 7 — Outline of the architecture of the U-NetDetailed architecture of the U-Net.

Fig. 8 — Detailed architecture of the U-Net.

The loss function of the noise network, which corresponds to Eq. (33), is optimised with the Adam stochastic optimisation method [31]. This algorithm evaluates and accumulates, for each iteration, the expectations of both the gradient and its second moment. The accumulated expectations are rescaled as time passes. Finally, a correction is applied to the learnable parameters. The stability index of the multidimensional Lévy distribution is annealed according to Eq. (34). The U-Net neural network and the Lévy–Itō diffusion model were implemented in the Mathematica language, version 13.3. The calculations were performed on a computing node with two 16-core Intel Xeon Gold 6130 CPUs clocked at 2.1 GHz, 192 GB of RAM, and two NVidia V100 GPUs with 32 GB of RAM each. Both GPUs were used concurrently for the calculations. The hyperparameters of the network were determined by grid search and are reported in Table 2.

Table 2.

Hyperparameters of the U-Net.

Number of steps used to corrupt the distance matrices	200
Base channel size of the U-Net	32
Encoded time step for the corruption of the distance matrices	16
Depth of the U-Net	3
Mini-batch size for training	1024
Initial noising parameter	0.0001
Final noising parameters	0.02

Open in a new tab

The results were evaluated with three metrics: the Fréchet distance, the density, and the coverage. The Fréchet distance [13] measures the similarity between the real and generated data distributions:

d_{F}^{2} (X, Y) = {| μ_{X} - μ_{Y} |}^{2} + tr [Σ_{X} + Σ_{Y} - 2 {(Σ_{X} Σ_{Y})}^{1 / 2}] \in R^{+}

(40)

where $μ_{X}$ and $Σ_{X}$ are the mean and covariance of the real data, while $μ_{Y}$ and $Σ_{Y}$ are the mean and covariance of the generated data. The square root is evaluated according to Eq. (24) [41]. The density metric measures the fidelity, the degree to which the generated samples resemble the real ones [14]. It is defined as

ρ = \frac{1}{k M} \sum_{j = 1}^{M} \sum_{i = 1}^{N} I_{y_{j} \in ◯ (x_{i}, {NND}_{k} (x_{i}))}, ρ \geq 0

(41)

where $I$ is a binary (0 or 1) indicator, ${NND}_{k} (x_{i})$ is the Euclidean distance between $x_{i}$ and its $k^{t h}$ nearest neighbour, $◯ (x_{i} r)$ is the hypersphere of radius r centred on $x_{i}$ , ${x_{i}}_{i = 1}^{N}$ are the real data, and ${y_{j}}_{j = 1}^{M}$ are the generated data. The density is not bounded: the larger the value, the better the fidelity. The coverage metric, on the other hand, measures the diversity, the degree to which the generated samples cover the full variability of the real samples [14]. It is defined as

κ = \frac{1}{N} \sum_{i = 1}^{N} I_{\exists j | y_{j} \in ◯ (x_{i}, {NND}_{k} (x_{i}))}, κ \in [0, 1]

(42)

Unlike the density, the coverage is bounded and normalised. The density and coverage have been proposed as more robust alternatives to the precision and recall as defined in [55], which tend to overestimate the true manifolds around outliers [14].

7. Experimental results and discussion

The dataset was generated automatically by retrieving $42, 082$ Homo sapiens proteins from the PDB. For each protein downloaded, the coordinates of the α-carbons were extracted, and the corresponding distance matrices were evaluated for the first 32 amino acids with the help of Eq. (37). During the training phase, the dataset was divided into a training (90%) and a validation set (10%). A total of twenty Lévy–Itō models were trained. The models were parametrised by their stability index annealing schedule $α_{t}$ , defined by Eq. (34), and by their model hyperparameter η (Markovian versus non-Markovian). For each model, the quality of the generated distance matrices (one thousand generated distance matrices for each model) was assessed with three metrics, namely the Fréchet distance, the density, and the coverage, which are defined by Eq. (40), (41) and (42), respectively. The results are reported in Table 3.

Table 3.

Evaluation of the Lévy–Itō models for various stability index annealing schedules and model hyperparameters.

$α_{t} : [α_{\min}, α_{\max} = 2]$	η	Fréchet distance	Density	Coverage
1.7	0 (DDIP)	3356.07	151.116	0.480562
1.8		1461.18	722.888	0.731619
1.9		132.6	2220.11	0.87636
1.7	0.2	3002.92	377.008	0.641818
1.8		1303.88	1384.61	0.852051
1.9		115.51	2964.98	0.929732
1.7	0.5	4300.96	3167.42	0.96647
1.8		741.59	4134.47	0.969488
1.9		62.74	4528.85	0.980538
1.7	0.8	586.77	5139.79	0.978993
1.8		276.59	4614.55	0.962739
1.9		29.48	5393.87	0.982296
1.7	1 (DDPM)	227.61	5471.74	0.974455
1.8		46.41	5129.79	0.951333
1.9		31.74	5145.97	0.923697

2	0	38.82	407.418	0.633382
2	0.2	29.86	2133.49	0.886127
2	0.5	39.53	4029.47	0.878998
2	0.8	39.9	4541.96	0.767193
2	1	40.46	4716.76	0.663181

Open in a new tab

To facilitate graphical comparison, 3-D surface plots for the Fréchet distance, the density and the coverage are provided in Fig. 9, Fig. 10, Fig. 11, respectively.

Fig. 11 — A 3-D surface plot of the dependence of the coverage on the stability index and the parameter η. The surface has been interpolated to order three.

The best results appear in bold. The best Fréchet distance and coverage were obtained with a stability index annealing schedule of 1.9–2 and a model parameter of 0.8. The best density was obtained with a stability index annealing schedule of 1.7–2 and a model parameter of 1. As shown in the table, Markovian models tend to outperform non-Markovian models when Lévy distributions are employed. Pure non-Markovian (DDIM) models $(η = 0)$ have poor performances irrespective of the stability index annealing schedule. The best performances were obtained for Markovian (DDPM) and quasi-Markovian models $(η = 0.8)$ . The annealed Lévy distribution improves all metrics, particularly the coverage, which attained values as high as 0.98. Compared with the best Gaussian model, which appears in italics, the best Lévy–Itō model improved the Fréchet distance by 25.4%, the density by 35.8%, and the coverage by 11.8%. Therefore, the Lévy–Itō model outperforms the Gaussian model both in the Markovian and non-Markovian regimes.

8. Conclusions and future work

A new diffusion model for data generation has been proposed based on fractional SDEs, the exponential integrator method, and the annealed stable distribution. The model outperforms Markovian and non-Markovian Gaussian diffusion models when evaluated according to the Fréchet distance, density, and coverage when applied to distance matrices. This improved performance originates from the Lévy distribution. As shown in Fig. 1, which represents Lévy random walks for various values of the stability index, Lévy distributions tend to explore the solution space further than their Gaussian counterparts. This may be explained by their heavy tails, which generate larger steps than normal distributions. These large steps make it possible to reach areas that would otherwise remain inaccessible with a Gaussian random walk [38]. Initially, the stability index is low to promote solution space exploration. Then, it is progressively increased (annealed) to converge to a particular solution. This approach is reminiscent of simulated annealing [56], an optimisation method in which the temperature is initially high to allow the algorithm to escape local minima and is then progressively annealed to lower values to converge to the optimal solution. In future work, the proposed method will be applied to more sophisticated diffusion models, such as the generalised denoising diffusion implicit models [24]. Asymmetrical Lévy distributions and smaller stability indices (heavier tails) could also be employed. Currently, the multidimensional Lévy distribution is diagonal, which means that it is implicitly assumed that there is little or no covariation [57] between the dimensions, covariation being a generalisation of the notion of covariance for stable distributions. Therefore, it is proposed to employ non-diagonal stable distributions [45], [58], which are defined as

L_{α, μ} (k) = \exp (- I_{x} (k)) = E [\exp (i 〈 k, x 〉)] = \exp [- i 〈 μ, k 〉 - \int_{S^{d}} ψ_{α} (〈 k, s 〉) Δ (d s)]

(43)

where $E$ is the mathematical expectation, $〈 k, x 〉$ is the inner product, and

ψ_{α} (u) \overset{\land}{=} {\begin{matrix} {| u |}^{α} (1 - i sgn (u) \tan \frac{π α}{2}) & if α \neq 1 \\ | u | (1 + i \frac{π}{2} sgn (u) \ln | u |) & if α = 1 \end{matrix}

(44)

Unlike the univariate case, only two parameters are required: a stability index α and a localisation vector μ. The information about the scale and the asymmetry is encapsulated in $Δ (d s)$ , which is a measure, or partition, defined on the hypersphere $S^{D}$ . The application of the multidimensional Lévy distribution for protein generation shall, therefore, be the foundation of our future work.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The authors would like to thank the Artificial Intelligence for Design Challenge program from the National Research Council Canada for funding this project.

Contributor Information

Eric Paquet, Email: Eric.Paquet@nrc-cnrc.gc.ca.

Farzan Soleymani, Email: fsoleyma@uottawa.ca.

Herna Lydia Viktor, Email: hviktor@uottawa.ca.

Wojtek Michalowski, Email: wojtek@telfer.uottawa.ca.

Appendix A.

Table A.4.

Definitions of the mathematical symbols used in this work.

Open in a new tab

References

1.Whitford D. John Wiley & Sons; 2013. Proteins: structure and function. [Google Scholar]
2.Dill K.A., MacCallum J.L. The protein-folding problem, 50 years on. Science. 2012;338(6110):1042–1046. doi: 10.1126/science.1219021. [DOI] [PubMed] [Google Scholar]
3.Luo Y. Sensing the shape of functional proteins with topology. Nat Comput Sci. 2023;3(2):124–125. doi: 10.1038/s43588-023-00404-7. [DOI] [PubMed] [Google Scholar]
4.Wang L., Wang N., Zhang W., Cheng X., Yan Z., Shao G., et al. Therapeutic peptides: current applications and future directions. Signal Transduct Targeted Ther. 2022;7(1):48. doi: 10.1038/s41392-022-00904-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Valastyan J.S., Lindquist S. Mechanisms of protein-folding diseases at a glance. Dis. Models Mech. 2014;7(1):9–14. doi: 10.1242/dmm.013474. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Dopp J.L., Rothstein S.M., Mansell T.J., Reuel N.F. Rapid prototyping of proteins: mail order gene fragments to assayable proteins within 24 hours. Biotechnol Bioeng. 2019;116(3):667–676. doi: 10.1002/bit.26912. [DOI] [PubMed] [Google Scholar]
7.Van Landuyt L., Lonigro C., Meuris L., Callewaert N. Customized protein glycosylation to improve biopharmaceutical function and targeting. Curr Opin Biotechnol. 2019;60:17–28. doi: 10.1016/j.copbio.2018.11.017. [DOI] [PubMed] [Google Scholar]
8.Gagner J.E., Kim W., Chaikof E.L. Designing protein-based biomaterials for medical applications. Acta Biomater. 2014;10(4):1542–1557. doi: 10.1016/j.actbio.2013.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Banavar JR, Giacometti A, Hoang TX, Maritan A, Škrbić T. A geometrical framework for thinking about proteins. Proteins: Struct Funct Bioinform. [DOI] [PubMed]
10.Hatfield M., Lovas S. Conformational sampling techniques. https://doi.org/10.2174/13816128113199990603 Curr Pharm Des 20. [DOI] [PubMed]
11.Lévy P. Sur les intégrales dont les éléments sont des variables aléatoires indépendantes. Ann Sc Norm Super Pisa, Cl Sci. 1934;3(3–4):337–366. [Google Scholar]
12.West B.J., Bologna M., Grigolini P., West B.J., Bologna M., Grigolini P. Failure of traditional models. Phys Fractal Oper. 2003:37–75. [Google Scholar]
13.Dowson D., Landau B. The Fréchet distance between multivariate normal distributions. J Multivar Anal. 1982;12(3):450–455. [Google Scholar]
14.Naeem M.F., Oh S.J., Uh Y., Choi Y., Yoo J. International conference on machine learning, PMLR. 2020. Reliable fidelity and diversity metrics for generative models; pp. 7176–7185. [Google Scholar]
15.Song Y., Sohl-Dickstein J., Kingma D.P., Kumar A., Ermon S., Poole B. Score-based generative modeling through stochastic differential equations. arXiv:2011.13456 arXiv preprint.
16.Song Y., Durkan C., Murray I., Ermon S. Maximum likelihood training of score-based diffusion models. Adv Neural Inf Process Syst. 2021;34:1415–1428. [Google Scholar]
17.Lee JS, Kim J, Kim PM. Proteinsgm: score-based generative modeling for de novo protein design. BioRxiv. 2022. [DOI] [PubMed]
18.Wu K.E., Yang K.K., Berg R.v.d., Zou J.Y., Lu A.X., Amini A.P. Protein structure generation via folding diffusion. arXiv:2209.15611 arXiv preprint. [DOI] [PMC free article] [PubMed]
19.Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. BioRxiv. 2022.
20.Tang X., Dai H., Knight E., Wu F., Li Y., Li T., et al. A survey of generative ai for de novo drug design: new frontiers in molecule and protein generation. arXiv:2402.08703 arXiv preprint. [DOI] [PMC free article] [PubMed]
21.Guo Z., Liu J., Wang Y., Chen M., Wang D., Xu D., et al. Diffusion models in bioinformatics: a new wave of deep learning revolution in action. arXiv:2302.10907 arXiv preprint.
22.Ilan Y. Making use of noise in biological systems. Prog Biophys Mol Biol. 2023;178:83–90. doi: 10.1016/j.pbiomolbio.2023.01.001. [DOI] [PubMed] [Google Scholar]
23.Sagarin R.D., Taylor T. Natural security: how biological systems use information to adapt in an unpredictable world. Secur Inform. 2012;1:1–9. [Google Scholar]
24.Zhang Q., Chen Y. Fast sampling of diffusion models with exponential integrator. arXiv:2204.13902 arXiv preprint.
25.Särkkä S., Solin A. Cambridge University Press; 2019. Applied stochastic differential equations, vol. 10. [Google Scholar]
26.Anderson B.D. Reverse-time diffusion equation models. Stoch Process Appl. 1982;12(3):313–326. [Google Scholar]
27.Vincent P. A connection between score matching and denoising autoencoders. Neural Comput. 2011;23(7):1661–1674. doi: 10.1162/NECO_a_00142. [DOI] [PubMed] [Google Scholar]
28.Jo J., Lee S., Hwang S.J. International conference on machine learning, PMLR. 2022. Score-based generative modeling of graphs via the system of stochastic differential equations; pp. 10362–10383. [Google Scholar]
29.Gehring J., Auli M., Grangier D., Yarats D., Dauphin Y.N. International conference on machine learning, PMLR. 2017. Convolutional sequence to sequence learning; pp. 1243–1252. [Google Scholar]
30.Ho J., Jain A., Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inf Process Syst. 2020;33:6840–6851. [Google Scholar]
31.Kingma D.P., Ba J. Adam: a method for stochastic optimization. arXiv:1412.6980 arXiv preprint.
32.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst 30.
33.Hyvärinen A, Dayan P. Estimation of non-normalized statistical models by score matching. J Mach Learn Res 6(4).
34.Mannella R. Numerical integration of stochastic differential equations. arXiv:cond-mat/9709326 arXiv preprint.
35.Dockhorn T., Vahdat A., Kreis K. Score-based generative modeling with critically-damped Langevin diffusion. arXiv:2112.07068 arXiv preprint.
36.Song J., Meng C., Ermon S. Denoising diffusion implicit models. arXiv:2010.02502 arXiv preprint.
37.Maddison C.J., Mnih A., Teh Y.W. The concrete distribution: a continuous relaxation of discrete random variables. arXiv:1611.00712 arXiv preprint.
38.Paquet E., Viktor H.L., Madi K., Wu J. Deformable protein shape classification based on deep learning, and the fractional Fokker–Planck and Kähler–Dirac equations. IEEE Trans Pattern Anal Mach Intell. 2022;45(1):391–407. doi: 10.1109/TPAMI.2022.3146796. [DOI] [PubMed] [Google Scholar]
39.Applebaum D. Cambridge University Press; 2009. Lévy processes and stochastic calculus. [Google Scholar]
40.Yoon E., Park K., Kim J., Lim S. NeurIPS 2022 workshop on score-based methods. 2022. Score-based generative models with Lévy processes. [Google Scholar]
41.Strang G. SIAM; 2022. Introduction to linear algebra. [Google Scholar]
42.Ortigueira M.D., et al. Riesz potential operators and inverses via fractional centred derivatives. Int J Math Math Sci. 2006 [Google Scholar]
43.Şimşekli U. International conference on machine learning, PMLR. 2017. Fractional Langevin Monte Carlo: exploring Lévy driven stochastic differential equations for Markov chain Monte Carlo; pp. 3200–3209. [Google Scholar]
44.Bronstein M.M., Bruna J., LeCun Y., Szlam A., Vandergheynst P. Geometric deep learning: going beyond Euclidean data. IEEE Signal Process Mag. 2017;34(4):18–42. [Google Scholar]
45.Samorodnitsky G., Taqqu M.S., Linde R. Stable non-Gaussian random processes: stochastic models with infinite variance. Bull Lond Math Soc. 1996;28(134):554–555. [Google Scholar]
46.Burley S.K., Berman H.M., Bhikadiya C., Bi C., Chen L., Di Costanzo L., et al. Rcsb protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019;47(D1):D464–D474. doi: 10.1093/nar/gky1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Dill K.A., Ozkan S.B., Shell M.S., Weikl T.R. The protein folding problem. Annu Rev Biophys. 2008;37:289–316. doi: 10.1146/annurev.biophys.37.092707.153558. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Frauenfelder H. Springer Science & Business Media; 2010. The physics of proteins: an introduction to biological physics and molecular biophysics. [Google Scholar]
49.Bao F., Zhao M., Hao Z., Li P., Li C., Zhu J. Equivariant energy-guided sde for inverse molecular design. arXiv:2209.15408 arXiv preprint.
50.Hoffmann M., Noé F. Generating valid Euclidean distance matrices. arXiv:1910.03131 arXiv preprint.
51.Kloczkowski A., Jernigan R.L., Wu Z., Song G., Yang L., Kolinski A., et al. Distance matrix-based approach to protein structure prediction. J Struct Funct Genomics. 2009;10(1):67–81. doi: 10.1007/s10969-009-9062-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Anand N, Huang P. Generative modeling for protein structures. Adv Neural Inf Process Syst 31.
53.Ronneberger O., Fischer P., Brox T. U-net: convolutional networks for biomedical image segmentation. Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference; Munich, Germany, October 5–9, 2015; Springer; 2015. pp. 234–241. [Google Scholar]
54.Ramachandran P., Zoph B., Le Q.V. Searching for activation functions. arXiv:1710.05941 arXiv preprint.
55.Kynkäänniemi T, Karras T, Laine S, Lehtinen J, Aila T. Improved precision and recall metric for assessing generative models. Adv Neural Inf Process Syst 32.
56.Guilmeau T., Chouzenoux E., Elvira V. 2021 IEEE statistical signal processing workshop (SSP) IEEE; 2021. Simulated annealing: a review and a new scheme; pp. 101–105. [Google Scholar]
57.Cheng B., Rachev S.T. Multivariate stable futures prices. Math Finance. 1995;5(2):133–153. [Google Scholar]
58.Paquet E., Viktor H.L., Guo H. Learning in the presence of large fluctuations: a study of aggregation and correlation. New frontiers in mining complex patterns: first international workshop, NFMCP 2012, held in conjunction with ECML/PKDD 2012; Bristol, UK, September 24, 2012; Springer; 2013. pp. 49–63. [Google Scholar]

[br0010] 1.Whitford D. John Wiley & Sons; 2013. Proteins: structure and function. [Google Scholar]

[br0020] 2.Dill K.A., MacCallum J.L. The protein-folding problem, 50 years on. Science. 2012;338(6110):1042–1046. doi: 10.1126/science.1219021. [DOI] [PubMed] [Google Scholar]

[br0030] 3.Luo Y. Sensing the shape of functional proteins with topology. Nat Comput Sci. 2023;3(2):124–125. doi: 10.1038/s43588-023-00404-7. [DOI] [PubMed] [Google Scholar]

[br0040] 4.Wang L., Wang N., Zhang W., Cheng X., Yan Z., Shao G., et al. Therapeutic peptides: current applications and future directions. Signal Transduct Targeted Ther. 2022;7(1):48. doi: 10.1038/s41392-022-00904-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0050] 5.Valastyan J.S., Lindquist S. Mechanisms of protein-folding diseases at a glance. Dis. Models Mech. 2014;7(1):9–14. doi: 10.1242/dmm.013474. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0060] 6.Dopp J.L., Rothstein S.M., Mansell T.J., Reuel N.F. Rapid prototyping of proteins: mail order gene fragments to assayable proteins within 24 hours. Biotechnol Bioeng. 2019;116(3):667–676. doi: 10.1002/bit.26912. [DOI] [PubMed] [Google Scholar]

[br0070] 7.Van Landuyt L., Lonigro C., Meuris L., Callewaert N. Customized protein glycosylation to improve biopharmaceutical function and targeting. Curr Opin Biotechnol. 2019;60:17–28. doi: 10.1016/j.copbio.2018.11.017. [DOI] [PubMed] [Google Scholar]

[br0080] 8.Gagner J.E., Kim W., Chaikof E.L. Designing protein-based biomaterials for medical applications. Acta Biomater. 2014;10(4):1542–1557. doi: 10.1016/j.actbio.2013.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0090] 9.Banavar JR, Giacometti A, Hoang TX, Maritan A, Škrbić T. A geometrical framework for thinking about proteins. Proteins: Struct Funct Bioinform. [DOI] [PubMed]

[br0100] 10.Hatfield M., Lovas S. Conformational sampling techniques. https://doi.org/10.2174/13816128113199990603 Curr Pharm Des 20. [DOI] [PubMed]

[br0110] 11.Lévy P. Sur les intégrales dont les éléments sont des variables aléatoires indépendantes. Ann Sc Norm Super Pisa, Cl Sci. 1934;3(3–4):337–366. [Google Scholar]

[br0120] 12.West B.J., Bologna M., Grigolini P., West B.J., Bologna M., Grigolini P. Failure of traditional models. Phys Fractal Oper. 2003:37–75. [Google Scholar]

[br0130] 13.Dowson D., Landau B. The Fréchet distance between multivariate normal distributions. J Multivar Anal. 1982;12(3):450–455. [Google Scholar]

[br0140] 14.Naeem M.F., Oh S.J., Uh Y., Choi Y., Yoo J. International conference on machine learning, PMLR. 2020. Reliable fidelity and diversity metrics for generative models; pp. 7176–7185. [Google Scholar]

[br0150] 15.Song Y., Sohl-Dickstein J., Kingma D.P., Kumar A., Ermon S., Poole B. Score-based generative modeling through stochastic differential equations. arXiv:2011.13456 arXiv preprint.

[br0160] 16.Song Y., Durkan C., Murray I., Ermon S. Maximum likelihood training of score-based diffusion models. Adv Neural Inf Process Syst. 2021;34:1415–1428. [Google Scholar]

[br0170] 17.Lee JS, Kim J, Kim PM. Proteinsgm: score-based generative modeling for de novo protein design. BioRxiv. 2022. [DOI] [PubMed]

[br0180] 18.Wu K.E., Yang K.K., Berg R.v.d., Zou J.Y., Lu A.X., Amini A.P. Protein structure generation via folding diffusion. arXiv:2209.15611 arXiv preprint. [DOI] [PMC free article] [PubMed]

[br0190] 19.Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. BioRxiv. 2022.

[br0200] 20.Tang X., Dai H., Knight E., Wu F., Li Y., Li T., et al. A survey of generative ai for de novo drug design: new frontiers in molecule and protein generation. arXiv:2402.08703 arXiv preprint. [DOI] [PMC free article] [PubMed]

[br0210] 21.Guo Z., Liu J., Wang Y., Chen M., Wang D., Xu D., et al. Diffusion models in bioinformatics: a new wave of deep learning revolution in action. arXiv:2302.10907 arXiv preprint.

[br0220] 22.Ilan Y. Making use of noise in biological systems. Prog Biophys Mol Biol. 2023;178:83–90. doi: 10.1016/j.pbiomolbio.2023.01.001. [DOI] [PubMed] [Google Scholar]

[br0230] 23.Sagarin R.D., Taylor T. Natural security: how biological systems use information to adapt in an unpredictable world. Secur Inform. 2012;1:1–9. [Google Scholar]

[br0240] 24.Zhang Q., Chen Y. Fast sampling of diffusion models with exponential integrator. arXiv:2204.13902 arXiv preprint.

[br0250] 25.Särkkä S., Solin A. Cambridge University Press; 2019. Applied stochastic differential equations, vol. 10. [Google Scholar]

[br0260] 26.Anderson B.D. Reverse-time diffusion equation models. Stoch Process Appl. 1982;12(3):313–326. [Google Scholar]

[br0270] 27.Vincent P. A connection between score matching and denoising autoencoders. Neural Comput. 2011;23(7):1661–1674. doi: 10.1162/NECO_a_00142. [DOI] [PubMed] [Google Scholar]

[br0280] 28.Jo J., Lee S., Hwang S.J. International conference on machine learning, PMLR. 2022. Score-based generative modeling of graphs via the system of stochastic differential equations; pp. 10362–10383. [Google Scholar]

[br0290] 29.Gehring J., Auli M., Grangier D., Yarats D., Dauphin Y.N. International conference on machine learning, PMLR. 2017. Convolutional sequence to sequence learning; pp. 1243–1252. [Google Scholar]

[br0300] 30.Ho J., Jain A., Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inf Process Syst. 2020;33:6840–6851. [Google Scholar]

[br0310] 31.Kingma D.P., Ba J. Adam: a method for stochastic optimization. arXiv:1412.6980 arXiv preprint.

[br0320] 32.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst 30.

[br0330] 33.Hyvärinen A, Dayan P. Estimation of non-normalized statistical models by score matching. J Mach Learn Res 6(4).

[br0340] 34.Mannella R. Numerical integration of stochastic differential equations. arXiv:cond-mat/9709326 arXiv preprint.

[br0350] 35.Dockhorn T., Vahdat A., Kreis K. Score-based generative modeling with critically-damped Langevin diffusion. arXiv:2112.07068 arXiv preprint.

[br0360] 36.Song J., Meng C., Ermon S. Denoising diffusion implicit models. arXiv:2010.02502 arXiv preprint.

[br0370] 37.Maddison C.J., Mnih A., Teh Y.W. The concrete distribution: a continuous relaxation of discrete random variables. arXiv:1611.00712 arXiv preprint.

[br0380] 38.Paquet E., Viktor H.L., Madi K., Wu J. Deformable protein shape classification based on deep learning, and the fractional Fokker–Planck and Kähler–Dirac equations. IEEE Trans Pattern Anal Mach Intell. 2022;45(1):391–407. doi: 10.1109/TPAMI.2022.3146796. [DOI] [PubMed] [Google Scholar]

[br0390] 39.Applebaum D. Cambridge University Press; 2009. Lévy processes and stochastic calculus. [Google Scholar]

[br0400] 40.Yoon E., Park K., Kim J., Lim S. NeurIPS 2022 workshop on score-based methods. 2022. Score-based generative models with Lévy processes. [Google Scholar]

[br0410] 41.Strang G. SIAM; 2022. Introduction to linear algebra. [Google Scholar]

[br0420] 42.Ortigueira M.D., et al. Riesz potential operators and inverses via fractional centred derivatives. Int J Math Math Sci. 2006 [Google Scholar]

[br0430] 43.Şimşekli U. International conference on machine learning, PMLR. 2017. Fractional Langevin Monte Carlo: exploring Lévy driven stochastic differential equations for Markov chain Monte Carlo; pp. 3200–3209. [Google Scholar]

[br0440] 44.Bronstein M.M., Bruna J., LeCun Y., Szlam A., Vandergheynst P. Geometric deep learning: going beyond Euclidean data. IEEE Signal Process Mag. 2017;34(4):18–42. [Google Scholar]

[br0450] 45.Samorodnitsky G., Taqqu M.S., Linde R. Stable non-Gaussian random processes: stochastic models with infinite variance. Bull Lond Math Soc. 1996;28(134):554–555. [Google Scholar]

[br0460] 46.Burley S.K., Berman H.M., Bhikadiya C., Bi C., Chen L., Di Costanzo L., et al. Rcsb protein data bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019;47(D1):D464–D474. doi: 10.1093/nar/gky1004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0470] 47.Dill K.A., Ozkan S.B., Shell M.S., Weikl T.R. The protein folding problem. Annu Rev Biophys. 2008;37:289–316. doi: 10.1146/annurev.biophys.37.092707.153558. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0480] 48.Frauenfelder H. Springer Science & Business Media; 2010. The physics of proteins: an introduction to biological physics and molecular biophysics. [Google Scholar]

[br0490] 49.Bao F., Zhao M., Hao Z., Li P., Li C., Zhu J. Equivariant energy-guided sde for inverse molecular design. arXiv:2209.15408 arXiv preprint.

[br0500] 50.Hoffmann M., Noé F. Generating valid Euclidean distance matrices. arXiv:1910.03131 arXiv preprint.

[br0510] 51.Kloczkowski A., Jernigan R.L., Wu Z., Song G., Yang L., Kolinski A., et al. Distance matrix-based approach to protein structure prediction. J Struct Funct Genomics. 2009;10(1):67–81. doi: 10.1007/s10969-009-9062-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0520] 52.Anand N, Huang P. Generative modeling for protein structures. Adv Neural Inf Process Syst 31.

[br0530] 53.Ronneberger O., Fischer P., Brox T. U-net: convolutional networks for biomedical image segmentation. Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference; Munich, Germany, October 5–9, 2015; Springer; 2015. pp. 234–241. [Google Scholar]

[br0540] 54.Ramachandran P., Zoph B., Le Q.V. Searching for activation functions. arXiv:1710.05941 arXiv preprint.

[br0550] 55.Kynkäänniemi T, Karras T, Laine S, Lehtinen J, Aila T. Improved precision and recall metric for assessing generative models. Adv Neural Inf Process Syst 32.

[br0560] 56.Guilmeau T., Chouzenoux E., Elvira V. 2021 IEEE statistical signal processing workshop (SSP) IEEE; 2021. Simulated annealing: a review and a new scheme; pp. 101–105. [Google Scholar]

[br0570] 57.Cheng B., Rachev S.T. Multivariate stable futures prices. Math Finance. 1995;5(2):133–153. [Google Scholar]

[br0580] 58.Paquet E., Viktor H.L., Guo H. Learning in the presence of large fluctuations: a study of aggregation and correlation. New frontiers in mining complex patterns: first international workshop, NFMCP 2012, held in conjunction with ECML/PKDD 2012; Bristol, UK, September 24, 2012; Springer; 2013. pp. 49–63. [Google Scholar]

PERMALINK

Annealed fractional Lévy–Itō diffusion models for protein generation

Eric Paquet

Farzan Soleymani

Herna Lydia Viktor

Wojtek Michalowski

Abstract

Graphical abstract

1. Introduction

2. Background

2.1. Diffusion processes, SDEs, and score-matching techniques

2.2. Solution of the backward equation with the exponential integrator method

Table 1.

3. Lévy–Itō diffusion models

Fig. 1.

Fig. 2.

Fig. 3.

4. Approximation of the fractional Riesz derivative and annealing of the stability index

Fig. 4.

5. Protein backbone representation

Fig. 5.

Fig. 6.

6. Implementation and methodology

Fig. 7.

Fig. 8.

Table 2.

7. Experimental results and discussion

Table 3.

Fig. 9.

Fig. 10.

Fig. 11.

8. Conclusions and future work

Declaration of Competing Interest

Acknowledgements

Contributor Information

Appendix A.

Table A.4.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases