ProT-GFDM: A generative fractional diffusion model for protein generation

Xiao Liang; Wentao Ma; Eric Paquet; Herna Viktor; Wojtek Michalowski

doi:10.1016/j.csbj.2025.07.045

. 2025 Aug 5;27:3464–3480. doi: 10.1016/j.csbj.2025.07.045

ProT-GFDM: A generative fractional diffusion model for protein generation

Xiao Liang ^a, Wentao Ma ^b, Eric Paquet ^c,^d,^⁎, Herna Viktor ^d, Wojtek Michalowski ^a

PMCID: PMC12345337 PMID: 40808803

Abstract

This work introduces the generative fractional diffusion model for protein generation (ProT-GFDM), a novel generative framework that employs fractional stochastic dynamics for protein backbone structure modeling. This approach builds on the continuous-time score-based generative diffusion modeling paradigm, where data are progressively transformed into noise via a stochastic differential equation and reversed to generate structured samples. Unlike classical methods that rely on standard Brownian motion, ProT-GFDM employs a fractional stochastic process with superdiffusive properties to improve the capture of long-range dependencies in protein structures. By integrating fractional dynamics with computationally efficient sampling, the proposed framework advances generative modeling for structured biological data, with implications for protein design and computational drug discovery.

Keywords: Generative diffusion model, Protein generation, Fractional Brownian motion, Stochastic differential equation, Ordinary differential equation

Graphical abstract

Highlights

•
Artificial intelligence (AI)-driven generative models have the capacity to generate de novo protein sequences and structures.
•
The following paper introduces ProT-GFDM, a novel generative model for protein design based on fractional diffusion.
•
Achieve enhanced sample diversity and fidelity through the employment of fBm-driven stochastic processes.
•
Demonstrate compatibility with both SDE and ODE solvers.

1. Introduction

Proteins are essential macromolecules whose unique amino acid sequences fold into specific three-dimensional structures that determine their biological function, specificity, and stability within the cell [1], [2]. De novo [3] protein generation aims to produce new protein sequences that fold into stable and functional structures, creating novel biomolecules for applications in medicine, biotechnology, and synthetic biology. Traditionally, protein design follows a two-step process: determining a backbone structure that meets the desired structural and biochemical properties and designing an amino acid sequence to fit that backbone. Often reliant on template-based fragment sampling and expert-defined topologies, this approach has limitations, including that the backbone may not be optimally designable or the sequence may fail to conform to the intended structure. These problems can lead to a relatively low success rate in computational protein design [5]. During the past decades, the exponential growth of protein sequence data has driven the development of myriad computational methods for de novo protein design [4] and protein function prediction [5]. A systematic review and comparison of methods is provided in [6].

Recent progress in artificial intelligence (AI) models has substantially changed the field of protein design. Traditionally, generative models, such as generative adversarial networks (GANs) [7], variational autoencoders (VAEs) [8], and flow-based models [9] have been widely applied to protein generation, enabling the design of novel proteins with the desired properties and facilitating the exploration of vast combinatorial spaces. For example, ProteinVAE [11] applies ProtBERT [12] to convert raw protein sequences into latent representations, using an encoder–decoder framework enhanced with positionwise multihead self-attention to capture long-range sequence dependencies. In contrast, ProT-VAE [13] employs a different pretrained language model, ProtT5NV, and incorporates an internal, family-specific encoder–decoder layer to learn parameters tailored to individual protein families. Conversely, ProteinGAN [14] employs a GAN architecture to generate protein sequences, and its capabilities are illustrated through the example of malate dehydrogenase, demonstrating its potential to produce fully functional enzymes.

However, Strokach et al. listed the disadvantages of generative models for protein design [15]. For instance, GANs and VAEs each have their drawbacks. For example, GANs can experience unstable training processes compared to other model types and often generate examples with low diversity due to mode collapse. Additionally, GANs lack mechanisms for mapping existing data to latent spaces or for calculating log-likelihoods. In contrast, VAEs are typically less effective at modeling data with a fixed dimensionality and often generate lower-resolution samples than GANs. Other deep learning approaches have also played a significant role in advancing protein design (e.g., [16], [17], [18], [19]).

Probabilistic diffusion models have recently demonstrated significant potential in bioinformatics, particularly in such applications as protein generation. These models learn to approximate complex data distributions by sequentially corrupting and denoising data samples, enabling the synthesis of high-quality, realistic output [20]. Denoising diffusion probabilistic models (DDPMs) [21] define a discrete-time Markov chain that progressively adds Gaussian noise to data, transforming data into a nearly isotropic Gaussian distribution. Formally, for each training sample $x_{0} \sim p_{data} (x)$ , the forward process follows a predefined variance schedule $0 < β_{1} < β_{2}, \dots, β_{T} < 1$ and constructs a sequence ${x_{0}, x_{1}, \dots, x_{T}}$ , where the transition at each step is given by the following:

p (x_{t} | x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I) .

Expanding this recurrence yields direct mapping from $x_{0}$ to $x_{t}$ :

x_{t} = \sqrt{1 - β_{t}} x_{t - 1} + \sqrt{β_{t}} ε_{t - 1}, t = 1, \dots, T,

(1)

where $ε_{t} \sim N (0, I)$ . Applying the Markov property, we can marginalize over all intermediate steps and derive the closed-form distribution of $x_{t}$ , given the original data sample:

p (x_{t} | x_{0}) = N (x_{t}; \sqrt{α_{t}} x_{0}, (1 - α_{t}) I),

where $α_{t} = \prod_{s = 1}^{t} (1 - β_{s})$ . The noise schedule is predefined such that $α_{T} \to 0$ , ensuring that $x_{T}$ asymptotically approaches a standard Gaussian distribution (i.e., $p (x_{T}) \approx N (0, I)$ ). The reverse process, which transforms Gaussian noise back into structured data, is modeled as another Markov chain, parameterized as follows:

p (x_{t - 1} | x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), Σ (t)),

where the mean function $μ_{θ} (x_{t}, t)$ is learned using a neural network ${NN}_{θ} (x_{t}, t)$ . Once the model is trained, new samples can be generated by first drawing $x_{T} \sim N (0, I)$ , and iteratively updating:

x_{t - 1} = \frac{1}{\sqrt{1 - β_{t}}} (x_{t} + β_{t} {NN}_{θ^{⁎}} (x_{t}, t)) + \sqrt{β_{t}} z_{t}, t = T, T - 1, \dots, 1,

where $θ^{⁎}$ denotes the optimized parameters of the trained network ${NN}_{θ}$ . The DDPM effectively recovers structured data from noise via this iterative denoising process. An alternative approach to generative modeling is score-matching Langevin dynamics (SMLD) [22], a method initially developed in physics to describe the stochastic motion of particles under deterministic and random forces. Langevin dynamics facilitates sampling from a target distribution $p (x)$ using an iterative update rule:

x_{t} = x_{t - 1} + τ \nabla_{x} \log p (x_{t - 1}) + \sqrt{2 τ} z, z \sim N (0, I),

(2)

where τ denotes the step size, and $x_{0}$ is initialized from white noise. The term $\nabla_{x} \log p (x)$ represents the score function, also known as Stein's score function, which determines the optimal update direction for the sample. The score function $\nabla_{x} \log p (x)$ must be learned using neural networks to apply Langevin dynamics for generative modeling. Once trained, samples can be generated by iteratively refining noisy input via Langevin dynamics, guiding them toward the target data distribution. Yang et al. referred to these two model classes, DDPMs and SMLDs, together as score-based generative models. Both DDPMs and SMLDs can be expanded to scenarios encompassing an infinite number of time steps or noise levels, where perturbation and denoising procedures are characterized as solutions to stochastic differential equations (SDEs). This extended framework is referred to as a score-based SDE model (ScoreSDE) [20]. By interpreting the forward diffusion processes (1) and (2) as the discretization of an underlying SDE, we express the continuous-time limit as follows:

d x_{t} = f (x_{t}, t) d t + g (t) d w_{t},

(3)

where $f (x, t)$ represents the drift term, governing the deterministic evolution of the process, $g (t)$ denotes the diffusion coefficient, often defined in terms of the noise schedule, and $d w_{t}$ indicates a Wiener process, modeling stochastic perturbations. This generative framework generalizes previous score-based generative models by incorporating continuous-time SDEs and provides a mathematically flexible method to describe diffusion processes and generative modeling in continuous time.

In [23], [27], the authors explored the limitations and shortcomings of the classical ScoreSDE models equipped with Brownian motion (BM). When training data have unequal representation across modes, traditional diffusion models may fail to generate samples that properly represent all modes, focusing instead on dominant modes. In addition, generated samples might lack variety, producing outputs that are too similar or fail to explore the data distribution fully. These limitations might arise due to the nature of BM, which has a light-tailed distribution and independent increments. Moreover, BM relies on Gaussian noise; hence, it may struggle to model data distributions with heavy tails effectively (e.g., outliers or rare, extreme variations), whereas the independence assumption limits the ability to encode dependencies or correlations across time steps, which may be necessary to model complex, multimodal distributions accurately. To overcome this limitation, researchers have investigated Lévy processes and fractional BM (fBm) as potential alternatives to BM.

Yoon et al. [23] extended classical ScoreSDE models by replacing the underlying BM with a Lévy process, a stochastic process characterized by independent and stationary increments (i.e., the differential $d w_{t}$ in Equation (3) is replaced by $d L_{t}^{α}$ , where $L_{t}^{α}$ is an α-stable Lévy process). Stable processes necessarily have a stability index of α, which lies in the range $(0, 2]$ . When $α = 2$ , the process $L_{t}^{α}$ reduces to the classical BM and, therefore, necessarily has continuous paths. These models enable more efficient exploration of the conformational space and are applied to protein generation. Motivated by [23], the authors of [24] presented innovative Lévy–Itō diffusion models that integrate Markovian and non-Markovian dynamics, incorporating fractional SDEs and Lévy distributions, with applications in protein generation.

An alternative for the driving process is fBm, a generalization of the standard BM that introduces long-range dependencies and controlled roughness. The authors of [25] presented the first continuous-time score-based generative model that employs fractional diffusion processes to govern its dynamics (i.e., the differential $d w_{t}$ in Eq. (3) is replaced by $d B_{t}^{H}$ , where $B_{t}^{H}$ is a fBm, which features correlated increments and is characterized by the Hurst index $H \in (0, 1)$ with $H = 1 / 2$ corresponding to classical BM). Its precise definition is given in Definition 4.1. Moreover, BM and Lévy processes are semimartingales, meaning they can be decomposed into a sum of a local martingale and a finite variation process, enabling Itō calculus for stochastic integration [26]. A challenge in dealing with fBm is the nonsemimartingale nature, invalidating the use of classical Itō integrals. To maintain tractable inference and learning, the authors used a recently popularized Markov approximation of fBm (MA-fBm). They derived its reverse-time model, leading to the development of generative fractional diffusion models (GFDMs). In particular, SDEs driven by fBm are well-suited for capturing systems with temporal dependencies because they account for memory effects and correlations over time. Additional studies focusing on fBm are found in [26], [27].

This work proposes a novel fractional diffusion-based generative model for protein design, ProT-GFDM, employing the mathematical framework of MA-fBm. Unlike traditional diffusion models that rely on BM, the proposed approach incorporates long-range dependencies and super-diffusive behavior, enabling more expressive and controllable generative dynamics. Compared to prior diffusion models [20], the proposed method achieves superior sample diversity and fidelity. This work also explores the effect of various noise schedules, including an alternative cosine noise schedule [28] that provides a distinct approach to noise control and influences generative behavior. Furthermore, this work systematically evaluates the adaptability of the proposed model to stochastic and deterministic solvers, demonstrating that fractional diffusion models can be effectively integrated with SDE solvers and ordinary differential equation (ODE) solvers, expanding their applicability in generative modeling. These contributions establish the approach as a flexible and robust framework for advancing generative protein design.

The paper is organized as follows. Section 2 discusses the background of ScoreSDE, laying the theoretical foundation and presenting the relevant prior work. Next, Section 3 explores protein-structure image generation, where the protein backbone is represented by an α-carbon distance map. Section 4 focuses on the core contribution of this work and is subdivided into parts that explore the fractional driving noise and its Markov approximation in subsection 4.1, generative fractional diffusion models in 4.2, and augmented score-matching techniques in 4.3. This section also includes a detailed explanation of sampling methods in 4.4, presenting multiple approaches, such as SDE and ODE solvers, for efficient sampling. Following this, Section 5 presents the experimental results, highlighting the performance and validation of the proposed methods. Section 6 presents the conclusions and future directions for research. Finally, Section 7 provides the notational conventions, clarifying the symbols and terminology used throughout the paper.

2. Background of the ScoreSDE framework

To establish the connection between score-based generative models and SDEs, we examine the discrete-time formulation of DDPMs, as given in (1). We define a discrete step size $Δ t = \frac{1}{N}$ , where N represents the total number of discrete steps in the diffusion process. To facilitate the transition to a continuous formulation, we introduce an auxiliary noise schedule ${{\bar{β}}_{i}}_{i = 1}^{N}$ where $β_{i} = {\bar{β}}_{i} \cdot \frac{i}{N}$ . By expressing $β_{i}$ in terms of a continuous function, we obtain the following:

β_{i} = {\bar{β}}_{i} \cdot \frac{i}{N} = β (\frac{i}{N}) \cdot \frac{1}{N} = β (t + Δ t) Δ t,

where this work assumes that ${\bar{β}}_{i} \to β (t)$ as $N \to \infty$ , which is a continuous time function for $0 \leq t \leq 1$ . We let

x_{i} = x (\frac{i}{N}) = x (t + Δ t), ε_{i} = ε (\frac{i}{N}) = ε (t + Δ t) .

Hence, the Taylor expansion of $\sqrt{1 - x}$ for approximation yields

x_{i} = \sqrt{1 - β_{i}} x_{i - 1} + \sqrt{β_{i}} ε_{i - 1} x_{i} = \sqrt{1 - \frac{{\bar{β}}_{i}}{N}} x_{i - 1} + \sqrt{\frac{{\bar{β}}_{i}}{N}} ε_{i - 1} x (t + Δ t) = \sqrt{1 - β (t + Δ t) Δ t} \cdot x (t) + \sqrt{β (t + Δ t) Δ t} \cdot ε (t) x (t + Δ t) \approx (1 - \frac{1}{2} β (t + Δ t) Δ t) \cdot x (t) + \sqrt{β (t + Δ t) Δ t} \cdot ε (t) x (t + Δ t) \approx x (t) - \frac{1}{2} β (t) Δ t x (t) + \sqrt{β (t) Δ t} \cdot ε (t) .

As $N \to \infty$ , the discrete Markov chain in Eq. (1) converges to the following SDE:

d x_{t} = - \frac{1}{2} β (t) x_{t} d t + \sqrt{β (t)} d w_{t} .

(4)

Therefore, the DDPM forward update iteration can be written equivalently as an SDE. The variance $β (t)$ remains bounded as $t \to \infty$ because $β_{i}$ is constrained, ensuring a controlled noise increase that prevents the data from being completely overwhelmed. This bounded nature of variance is why it is referred to as the variance-preserving (VP) SDE.

The SMLD (2) framework has no explicit forward diffusion process as in DDPMs. However, we can approximate it using N discrete noise scales. Under this formulation, the recursive updates naturally follow a Markov chain:

x_{i} = x_{i - 1} + \sqrt{σ_{i}^{2} - σ_{i - 1}^{2}} z_{i - 1}, i = 1, \dots, N .

Assuming that the limit ${σ_{i}}_{i = 1}^{N}$ becomes the continuous-time function $σ (t)$ for $t \in [0, 1]$ , $z_{i}$ becomes $z (t)$ , and ${x_{i}}_{i = 1}^{N}$ becomes $x_{t}$ , where $x_{i} = x (\frac{i}{N})$ , then we obtain the following:

x (t + Δ t) = x (t) + \sqrt{σ^{2} (t + Δ t) - σ^{2} (t)} \cdot z (t) \approx x (t) + \sqrt{\frac{d}{d t} σ^{2} (t) Δ t} \cdot z (t)

At the limit when $Δ t \to 0$ , the equation converges to

d x = \sqrt{\frac{d}{d t} σ^{2} (t)} d w .

(5)

Unlike the VP SDE, where noise remains bounded, the diffusion term in (5) grows exponentially with time. Hence, the variance of the data distribution diverges as $t \to \infty$ , which is why this formulation is called variance exploding SDE.

Although SMLD and DDPM have demonstrated strong generative capabilities, they suffer from limitations, including slow sampling due to their iterative denoising steps and difficulties in likelihood evaluation. To address these challenges, the ScoreSDE model extends diffusion models to a continuous-time framework, offering critical advantages, including efficient likelihood computation via its connection to a probability-flow ODE (PF-ODE) and greater flexibility in sampling methods. The model is formulated as a stochastic process governed by a pair of SDEs: the forward and reverse-time SDEs.

Forward SDE (data-to-noise): A general framework for many score-based generative models is where noise is injected into the data distribution $p_{data} \equiv p_{0}$ via a forward SDE with more general drift and diffusion coefficients:

d x_{t} = f (x_{t}, t) d t + g (x_{t}, t) d w_{t}, x_{0} \sim p_{0},

(6)

where $t \in [0, T]$ resides in the continuous-time domain and $w_{t}$ is the standard Wiener process (a.k.a. BM). The drift coefficient $f (\cdot, t) : R^{d} \to R^{d}$ suggests how the diffusion particle should flow to the lowest energy state, whereas the diffusion coefficient $G (\cdot, t) : R^{d} \to R^{d}$ describes how the particle would randomly walk from one position to another, determining the strength of the random movement (i.e., the intensity of the random fluctuation).

Reverse-time SDE (noise-to-data): To generate new samples, the reverse-time SDE inverts this process, starting from noise and gradually reconstructing structured data by removing perturbations. A crucial result from [29] reveals that the reverse of a diffusion process is a diffusion process, running backward in time and governed by the reverse-time SDE:

d x_{t} = {f (x_{t}, t) - g {(x_{t}, t)}^{2} \nabla_{x} \log p_{t} (x_{t})} d t + g (x_{t}, t) d {\bar{w}}_{t},

(7)

where ${\bar{w}}_{t}$ denotes the standard Wiener process in reverse time (i.e., time flows backward from T to 0). When the reverse-time SDE is well-constructed, we can simulate it with numerical approaches to generate new samples from $p_{0}$ .

When discretizing SDEs, the step size Δt is limited by the randomness of the stochastic process [30]. A large step size (consequently, a small number of steps) often causes nonconvergence, especially in high-dimensional spaces, and the numerical scheme might fail to capture the correct dynamics, leading to a poor approximation of the solution's distribution and negatively affecting the capability of high-quality sample generation. However, when the step size is small, the iterative process that generates high-quality samples, often involving hundreds or thousands of denoising steps, makes the generation process computationally expensive and slow.

Additionally, SDE solvers do not offer a method to compute the exact log-likelihood of score-based generative models. To address those limitations, Song et al. [20] introduced a sampler based on ODEs, called the PF ODE, by converting any SDE into an ODE without changing its marginal distributions ${p_{t} (x)}_{t \in [0, T]}$ . Song et al. suggested that each SDE has a corresponding PF-ODE that produces deterministic processes, sampling from the same distribution as the SDE at each timestep.

Probability-Flow ODE: The PF-ODE corresponding to Eq. (6) follows the general form:

d x_{t} = {f (x, t) - \frac{1}{2} \nabla \cdot [G (x, t) G {(x, t)}^{⊤}] - \frac{1}{2} G (x, t) G {(x, t)}^{⊤} \nabla_{x} \log p_{t} (x_{t})} d t .

(8)

Appendix D.1 of [20] provides a detailed derivation of transforming the reverse SDE into the PF-ODE. The proof is based on the Fokker–Planck (or the Kolmogorov forward) equation. Trajectories obtained by solving the PF-ODE (8) have the same marginal distributions as the SDE trajectories (7). The critical component in (8) is the score function $\nabla_{x} \log p_{t} (x_{t})$ , which is approximated by neural networks. This approximation creates a parallel with neural ODEs [31], inherits all properties of neural ODEs, and enables sampling via ODE solvers and the precise computation of log-likelihoods. The PF-ODE also eliminates the stochasticity from the reverse process and allows for faster sampling in generative diffusion models while maintaining high-quality output [20]. Although the reverse process becomes deterministic, multiple diverse outputs can still be generated by starting from different random noise vectors because the ODE's initial conditions vary accordingly.

Remark 2.1

The equations (SDEs or ODEs) above describe how a system changes or evolves over time. Numerical solvers are algorithms that take these equations and produce a series of points that form a “trajectory” or “path” of how the system evolves by breaking time into small steps and applying approximate formulas. This trajectory is an approximation rather than an exact representation; however, it is typically sufficient to capture and understand the behavior of a dynamic system. In generative modeling, we generate new images by numerically solving Eqs. (7) and (8).

3. Protein backbone representation: α-carbon distance map

Proteins are macromolecules comprising linear chains of amino acids. Each amino acid contributes three principal backbone atoms: nitrogen (N), alpha carbon ( $C_{α}$ ), and carbonyl carbon (C). Although the carbonyl oxygen (O) is bonded to the backbone, it is not considered part of the backbone itself. These repeating units form the polypeptide backbone $N - C_{α} - C - O$ , with side chains branching from the $C_{α}$ atom. Various computational methods have been developed to predict protein function. These methods can be categorized into four groups based on the information they employ (there is overlap and correlation between them). Sequence-based methods include BLAST [32], DEEPred [33], and DeepGOPlus [34], etc.). The 3D structure-based methods include DeepFRI [35], LSTM-LM [36], and HEAL [37]. In addition, the PPI network-based methods include GeneMANIA [38] and deepNF [39], among others. Finally, the hybrid information-based methods include the CAFA challenge [40]. A comprehensive review and comparison of those methods is presented in [41]. Diffusion models have recently been adopted into protein engineering applications and have demonstrated strong performance in generating novel protein structures and sequences [42], [43], [44].

The dataset was derived from the Protein Data Bank (PDB) [45], a globally recognized repository archiving detailed 3D structural biomolecule data, including proteins, nucleic acids, and complex assemblies. This benchmark protein dataset was established in 1971, serving as a vital resource for researchers in structural biology, bioinformatics, and related fields and facilitating advances in drug discovery, molecular biology, and biotechnology. The PDB offers free access to the structural data submitted by scientists worldwide in a standardized format that supports computational analysis and visualization.

Fig. 1 illustrates the data preparation process, beginning with extracting atomic coordinates from a protein structure. The 3D model on the left represents the atomic arrangement of 101M sperm whale myoglobin obtained from the PDB. The table in the center displays a segment of the PDB file detailing atomic positions in a structured format. Using the extracted $(X, Y, Z)$ coordinates, the Euclidean distances between pairs of $C_{α}$ atoms are computed to construct the distance matrix, visualized as a heatmap (right). The color gradient (yellow to blue) encodes distance magnitudes, with the diagonal representing zero distance, corresponding to self-comparisons. An invariant representation is necessary to analyze protein structures independent of their spatial orientation. Proteins can adopt arbitrary poses in 3D space; hence, direct coordinate-based representations are sensitive to translation and rotation. Computing pairwise distances between $C_{α}$ atoms yields a structured representation, the distance matrix, which remains invariant to such transformations while preserving critical structural information. This representation is useful because it retains the sequential ordering of amino acids, defined by the N-terminus to C-terminus directionality, ensuring uniqueness, and it preserves sufficient information to recover the structure [10].

Following the generation of the $C_{α}$ distance map using a trained model, the 3D structure of the protein backbone can be reconstructed from the resulting image. Recovering or folding the 3D structure of a protein from the α-carbon distance matrix is a classical inverse problem in structural biology, which many studies have investigated. In the existing literature on 3D protein structure recovery, notable methods include the hidden Markov model (HMM)–based approaches FB5-HMM [46] and TorusDBN [47], the multiscale torsion angle GAN, 3DGAN [48], and a full-atom GAN.

Moreover, Anand et al. [10] introduced a GAN-based approach to generate sequence-sensitive, fixed-length protein structure fragments. Using these GANs, they produced $C_{α}$ pairwise distance maps and used the alternating direction method of multipliers (ADMM) alongside Rosetta minimization [49] to reconstruct the full protein structures from the derived distance constraints. The ADMM is an algorithm proposed in [50] that solves convex optimization problems by breaking a complicated one into smaller pieces, each of which is easier to handle. The ADMM follows a decomposition-coordination strategy, in which solutions to smaller local subproblems are synchronized to solve a broader global problem. Building on [10], Anand et al. [51] further refined the method by training deep neural networks to recover and refine full-atom pairwise distance matrices accurately for fixed-length fragments.

4. Generative fractional diffusion models

This section introduces the driving process, the fBm, along with its Markov approximation. Next, the generative fractional diffusion models are presented, followed by score-matching techniques designed to estimate the corresponding score function. Finally, this section discusses the SDE and ODE solvers used to model the data generation and sampling process in detail.

4.1. Fractional driving noise and its Markov approximation

Definition 4.1 Fractional Brownian Motion (Types I and II) —

The fBm family of continuous-time Gaussian processes is parameterized by the Hurst index $H \in (0, 1)$ , which controls the smoothness and correlation of the process. It generalizes the standard BM and exhibits self-similarity and stationary increments. Two common types of fBm exist:

•
Type I: Denoted by $W_{t}^{H}$ , it contains the covariance function:
$E [W_{t}^{H} W_{s}^{H}] = \frac{1}{2} (| t |^{2 H} + | s |^{2 H} - | t - s |^{2 H}) .$

•
Type II: Denoted by $B_{t}^{H}$ , its covariance function is given by
$E [B_{t}^{H} B_{s}^{H}] = \frac{1}{{(Γ (H + \frac{1}{2}))}^{2}} \int_{0}^{s} {((t - u) (s - u))}^{H - \frac{1}{2}} d u, s < t,$

where $Γ (\cdot)$ denotes the Gamma function. Type II is often referred to as the Riemann–Liouville Volterra process, which is commonly applied in financial mathematics. The Hurst index H governs the process behavior:

•
$H = 1 / 2$ corresponds to the standard BM.

•
If $H > 1 / 2$ , the process increments are positively correlated.

•
If $H < 1 / 2$ , the process increments are negatively correlated.

Fig. 2 illustrates how the Hurst index influences the smoothness of the fBm paths. When the Hurst index $H > 1 / 2$ , the increments of the fBm are positively correlated, making its trajectory smoother than the standard BM. Conversely, when $H < 1 / 2$ , the increments are negatively correlated, resulting in a rougher path compared to the BM.

The fBm differs from standard BM in that it has long-range dependence, and its increments are not independent, making it non-Markovian. More precisely, the Markov property implies that future states depend only on the present, whereas fBm incorporates memory effects, where past values influence future behavior via the Hurst parameter. Additionally, fBm is not a semimartingale because it lacks the necessary decomposition as a local martingale and a finite variation process, restricting it from being applied directly in stochastic calculus like standard BM. This nonsemimartingale property limits its compatibility with Itō calculus.

The nature of fBm poses a challenge for score-based generative models, which typically rely on Markovian SDEs where the future state depends only on the present. In [26], the authors developed a framework for approximating fBm using Markov processes called MA-fBm. Their critical contribution is constructing finite-dimensional, affine Markov processes approximating the behavior of fBm, which lacks the Markov property due to its long-range dependence. This approximation is crucial for the efficient simulation of fractional diffusion processes and ensures compatibility with score-based generative modeling. This approach enables the application of existing score-based generative models and well-established methods in the standard diffusion framework, facilitating the training and sampling of models driven by fractional stochastic processes.

Definition 4.2 Markov Approximation of fBm —

Choose $K \in N$ Ornstein–Uhlenbeck (OU) process with speeds of mean reversion $γ_{1}, \dots, γ_{K}$ and dynamics $d Y_{t}^{k} = - γ_{k} Y_{t}^{k} d t + d B_{t}$ :

$Y_{t}^{k} = Y_{0}^{k} e^{- γ_{k} t} + \int_{0}^{t} e^{- γ_{k} (t - s)} d B_{s}, t \geq 0,$

where $Y_{0}^{k} = \int_{- \infty}^{0} e^{γ_{k} s} d B_{s}$ (Type I) and $Y_{0}^{k} = 0$ (Type II). Given a Hurst index $H \in (0, 1)$ and a geometric space grid $γ_{k} = r^{k - n}$ with $r > 1$ and $n = \frac{K + 1}{2}$ , the process

${\hat{B}}_{t}^{H} : = \sum_{k = 1}^{K} ω_{k} (Y_{t}^{k} - Y_{0}^{k}), H \in (0, 1), t \geq 0,$

is the MA-fBm with approximation coefficients $ω_{1}, \dots, ω_{K} \in R$ .

To approximate fBm $B_{t}^{H}$ with minimal errors, we choose a geometrically spaced grid

γ_{k} = (r^{1 - n}, r^{2 - n}, \dots, r^{K - n}),

where $n = \frac{K + 1}{2}$ and $r > 1$ . The optimal approximation coefficients $ω = (ω_{1}, \dots, ω_{K}) \in R^{K}$ , for a given Hurst index $H \in (0, 1)$ and terminal time $T > 0$ , are obtained by minimizing the $L^{2} (P)$ -error:

ε (ω) : = \int_{0}^{T} E [{(B_{t}^{H} - {\hat{B}}_{t}^{H})}^{2}] d t .

The optimal coefficients satisfy the following closed-form system:

A ω = b,

where matrix A and vector b are given for the Type I fBm approximation by

A_{i, j} : = \frac{2 T + \frac{e^{- γ_{i} T} - 1}{γ_{i}} + \frac{e^{- γ_{j} T} - 1}{γ_{j}}}{γ_{i} + γ_{j}},

b_{k} : = \frac{2 T}{γ_{k}^{H + \frac{1}{2}}} - \frac{T^{H + \frac{1}{2}}}{γ_{k} Γ (H + \frac{3}{2})} + \frac{e^{- γ_{k} T} - Q (H + \frac{1}{2}, γ_{k} T) e^{γ_{k} T}}{γ_{k}^{H + \frac{3}{2}}},

where $Q (z, x)$ represents the regularized upper incomplete gamma function:

Q (z, x) = \frac{1}{Γ (z)} \int_{x}^{\infty} t^{z - 1} e^{- t} d t .

For Type II fBm approximation, the corresponding expressions for A and b are

A_{i, j} : = \frac{T + \frac{e^{- (γ_{i} + γ_{j}) T} - 1}{γ_{i} + γ_{j}}}{γ_{i} + γ_{j}},

b_{k} : = \frac{T}{γ_{k}^{H + \frac{1}{2}}} P (H + \frac{1}{2}, γ_{k} T) - \frac{H + \frac{1}{2}}{γ_{k}^{H + \frac{3}{2}}} P (H + \frac{3}{2}, γ_{k} T),

where $P (z, x)$ denotes the regularized lower incomplete gamma function:

P (z, x) = \frac{1}{Γ (z)} \int_{0}^{x} t^{z - 1} e^{- t} d t .

For further discussion on practical considerations in selecting geometric sequences $γ_{k}$ and the time horizon for optimizing the coefficients ω, see [27]. The experiments in Section 5 focus on the Type II fBm approximation.

4.2. Fractional noise-driven generative diffusion model

The framework is based on a simplified SDE where the diffusion coefficient does not depend on $X_{t}$ , that is, the diffusion scale g in (6) is only time-dependent, rather than time- and state-dependent:

d X_{t} = u (t) X_{t} d t + g (t) d {\hat{B}}_{t}^{H}, X_{0} \sim p_{0} .

(9)

The OU processes $Y_{t}^{k}$ , $k = 1, \dots, K$ approximate ${\hat{B}}_{t}^{H}$ satisfying the dynamics $d Y_{t}^{k} = - γ_{k} Y_{t}^{k} d t + d B_{t}$ ; thus, we have the following:

d {\hat{B}}_{t}^{H} = - \sum_{k = 1}^{K} ω_{k} γ_{k} Y_{t}^{k} d t + \sum_{k = 1}^{K} ω_{k} d B_{t} .

(10)

Rewriting the dynamics (9) with (10) yields

d X_{t} = [u (t) X_{t} - g (t) \sum_{k = 1}^{K} ω_{k} γ_{k} Y_{t}^{k}] d t + \sum_{k = 1}^{K} ω_{k} g (t) d B_{t} .

(11)

Considering the forward and OU processes defining the driving noise ${\hat{B}}_{t}^{H}$ , we have an augmented vector of the correlated process $Z \equiv (X, Y^{1}, \dots, Y^{K}) = {(Z_{t})}_{t \in [0, T]}$ , driven by the same BM

d Z_{t} = F (t) Z_{t} d t + G (t) d B_{t}, t \in [0, T],

(12)

where

F (t) = {(\begin{matrix} u (t) & - g (t) ω_{1} γ_{1} & - g (t) ω_{2} γ_{2} & \dots & - g (t) ω_{K} γ_{K} \\ 0 & - γ_{1} & 0 & \dots & 0 \\ 0 & 0 & - γ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & - γ_{K} \end{matrix})}_{(K + 1) \times (K + 1)}

and

G (t) = {(\begin{matrix} \sum_{k = 1}^{K} ω_{k} g (t) & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & 1 \end{matrix})}_{(K + 1) \times (K + 1) .}

The reverse time model of the GFDM (12) is given by the backward dynamics

d Z_{t} = {F (t) Z_{t} - G (t) G {(t)}^{T} \nabla_{z} log p_{t} (Z_{t})} d t + G (t) d B_{t}, t \in [0, T],

(13)

and its PF-ODE is given by

d z_{t} = {F (t) z_{t} - \frac{1}{2} G (t) G {(t)}^{T} \nabla_{z} log p_{t} (z_{t})} d t, t \in [0, T] .

(14)

Fig. 3 breaks down the process of the ScoreSDE model, illustrating its critical components and their interactions. The ScoreSDE employs SDEs to model data generation as a diffusion process. The forward SDE progressively perturbs actual data into a Gaussian noise distribution. In contrast, the reverse SDE inverts this process by iteratively denoising the samples, guided by a learned score function $\nabla_{z_{t}} \log p_{t} (z_{t})$ . Alternatively, a reverse ODE can be applied for a deterministic sampling approach, improving stability and efficiency. A U-Net architecture is commonly employed to parameterize the score function $\nabla_{z_{t}} \log p_{t} (z_{t})$ , facilitating precise noise estimation and denoising.

4.3. Augmented score-matching technique

The score function (the gradient of the log-probability density $p_{data}$ ), $\nabla_{x} \log p_{t} (x_{t})$ , which appears in (7) or (8), guides the transformation of pure noise into coherent images by supplying directional information on how to increase the data likelihood at each step of the reverse diffusion process. Score matching is a nonlikelihood-based method to perform sampling on an unknown data distribution and aims to address many of the limitations of likelihood-based methods (e.g., VAE and autoregressive models) and adversarial methods (GANs). A good estimation of the score function is crucial because it directly determines the quality, realism, and fidelity of the data generated using score-based generative models. In contrast, an inaccurate score estimation can lead to poor generation results, including artifacts, mode collapse, or failure to capture an accurate data distribution.

In practice, the score function is learned using a neural network $s_{θ} (x)$ parameterized by θ, obtained by minimizing the Fisher divergence between the true score function and the network:

J (θ) = E_{p_{data} (x)} [{‖ s_{θ} (x) - \nabla_{x} \log p_{data} (x) ‖}_{2}^{2}] .

(15)

The problem is that $p_{data}$ is never accessible. Various methods can estimate the score function without knowledge of the ground-truth data score.

Implicit score matching (ISM) Hyvärinen and Dayan [53] introduced the implicit score matching method and demonstrated its equivalence to $J (θ)$ in (15), up to a constant, under mild regulatory conditions:

J_{I S M} (θ) = E_{p_{data} (x)} [\frac{1}{2} {‖ s_{θ} (x) ‖}_{2}^{2} + tr (\nabla_{x} s_{θ} (x))] + constant,

(16)

where $tr (\nabla_{x} s_{θ} (x))$ denotes the trace of the Hessian matrix of the log-probability with respect to x. Equation (16) is derived by simplifying (15) using the generalized integration by parts (multidimensional) technique. Minimizing $J_{I S M} (θ)$ does not require the true target scores $\nabla_{x} \log p_{data} (x)$ , and we only need to compute an expectation with respect to the data distribution, which can be implemented using finite samples from the dataset using the Markov chain Monte Carlo method.

Denoising score matching (DSM) An alternative approach is DSM, proposed by Vincent in [56]. The DSM approach introduces noise perturbations to the data x, creating a smoothed version of the data distribution $\tilde{x}$ . Instead of directly modeling the original distribution, the model learns the score function of the corrupted distribution. Given noisy data $\tilde{x} = x + σ ϵ$ , where σ controls noise intensity, the model minimizes the following DSM objective:

J_{D S M} (θ) = E_{q_{σ} (\tilde{x}, x)} [\frac{1}{2} {‖ s_{θ} (\tilde{x}) - \nabla_{\tilde{x}} \log q_{σ} (\tilde{x} | x) ‖}_{2}^{2}] = E_{q_{σ} (\tilde{x}, x)} [\frac{1}{2} {‖ s_{θ} (\tilde{x}) - \frac{x - \tilde{x}}{σ^{2}} ‖}_{2}^{2}]

over the joint density on original-corrupt data pairs $(\tilde{x}, x)$ , which is $q_{σ} (\tilde{x}, x) = q_{σ} (\tilde{x} | x) p_{data} (x)$ . Moreover, $\nabla_{\tilde{x}} \log q_{σ} (\tilde{x} | x)$ is not the data score, but it provides the information of the direction of moving from $\tilde{x}$ back to the original x (that is the reason this method is called “denoising”).

Sliced score matching (SSM) The loss function in (16) requires computing the trace, which remains intractable in high-dimensional spaces, as analyzed in [54]. Song et al. proposed SSM [22], [55] to scale up the computation of score matching considerably. They projected the scores in random directions, such that the vector fields of the scores of the data and model distribution become scalar fields, motivated by the fact that a one-dimensional data distribution is much easier to estimate for score matching. Then, they analyzed the scalar fields to assess the disparity between the model and data distributions. The trackable training objective can be considered a random projected version of the Fisher divergence of (15):

J_{S S M} (θ) = E_{p_{v}} E_{p_{data} (x)} [\frac{1}{2} {‖ s_{θ} (x) ‖}^{2} + v^{T} \nabla_{x} s_{θ} (x) v] + constant,

where $v \sim N (0, I)$ represents the random projection direction, and $p_{v}$ denotes its distribution. The SSM approach is considered one alternative to DSM that is consistent and computationally efficient.

Noise-conditioned score network In the previous DSM approach, the strength σ determines how well the corrupted data distribution $q_{σ} (\tilde{x})$ aligns with the original distribution $p_{data} (x)$ . A trade-off occurs in selecting σ. A larger σ value helps capture low-density regions of the data distribution, improving the score estimation. However, if σ is too large, the data distribution becomes excessively corrupted, making learning challenging. Conversely, a smaller σ preserves the original data distribution more accurately but fails to perturb the data sufficiently, poorly covering low-density regions. To address this problem, Song and Ermon introduced a multiscale noise perturbation approach [22], [57], applying noise at multiple levels simultaneously. Specifically, they defined L noise-perturbed data distributions, each associated with a different noise scale σ, ordered such that

σ_{1} > σ_{2} > \dots > σ_{L},

with higher indices corresponding to lower noise levels. The core idea behind the noise-conditioned score network method is perturbing the data distribution with the isotropic Gaussian noise $N (0, σ_{i}^{2} I), i = 1, \dots, L$ to obtain a noise-perturbed distribution:

p_{σ_{i}} (x) = \int p (y) N (x; y, σ_{i}^{2} I) d y .

The next step is to train the noise-conditioned score-based model $s_{θ} (x, i)$ to estimate the score function of each noise-perturbed distribution, for $i = 1, \dots, L$ :

s_{θ} (x, i) \approx \nabla_{x} \log p_{σ_{i}} (x) .

The training objective is the weighted sum of Fisher divergences for all noise scales:

\sum_{i = 1}^{L} λ (i) E_{p_{σ_{i}} (x)} [{‖ s_{θ} (x, i) - \nabla_{x} \log p_{σ_{i}} (x) ‖}_{2}^{2}],

where $λ (i) \in R$ denotes a positive weighting function, often set to $λ (i) = σ_{i}^{2}$ .

Remark 4.3

In diffusion generative models, two primary approaches are commonly employed to define the model task: data input prediction models and noise prediction models. For data input prediction models, the model is trained to directly predict the original data (e.g., the clean image) $x_{0}$ from the noisy data $x_{t}$ . At each step in the reverse process, the model estimates the clean data conditioned on the noisy input. For noise prediction models, the model is trained to predict the added noise ε in the noisy data $x_{t} = α_{t} x_{0} + σ_{t} ε$ . This approach focuses on estimating the noise rather than the clean data.

This work applies augmented score-matching techniques to learn the score function $\nabla_{z} \log p_{t}$ , which appears in (13). The authors in [25] suggested that one can condition the score function $\nabla_{Z} \log p_{t} (Z_{t})$ on a data sample $X_{0} \sim p_{0}$ and on the states of the stacked vector $Y_{t}^{[K]} : = (Y_{1}^{1}, \dots, Y_{t}^{K})$ of augmenting processes. A time-dependent score model $s_{θ}$ can be trained by minimizing the following augmented score-matching loss:

L (θ) : = E_{t \sim U [0, T]} {E_{X_{0}, Y_{t}^{[K]}} E_{X_{t} | Y_{t}^{[K]}, X_{0}} ‖ s_{θ} (X_{t} - \sum_{k} η_{t}^{k} Y_{t}^{k}, t) {- \nabla_{x} \log p_{0 t} (X_{t} | Y_{t}^{[K]}, X_{0}) ‖}_{2}^{2}} .

The weights $η_{t}^{k}, \dots, η_{t}^{k}$ arise from conditioning $Z_{t}$ on $Y_{t}^{[K]}$ . We assume that $s_{θ}$ is optimal with respect to the augmented score matching loss $L$ . The score model

S_{θ} (Z_{t}, t) : = (s_{θ} (X_{t} - \sum_{k} η_{t}^{k} Y_{t}^{k}, t), - η_{t}^{1} s_{θ} (X_{t} - \sum_{k} η_{t}^{k} Y_{t}^{k}, t), \dots, - η_{t}^{K} s_{θ} (X_{t} - \sum_{k} η_{t}^{k} Y_{t}^{k}, t))

yields the optimal $L^{2}$ approximation of $\nabla_{z} log p_{t} (Z_{t})$ via

\nabla_{z} log p_{t} (Z_{t}) \approx S_{θ} (Z_{t}, t) + \nabla_{Y} log q_{t} (Y_{t}^{[K]}) .

(17)

The random vector $Y_{t}^{[K]}$ is a centered Gaussian process with a covariance matrix $Σ_{t}^{y} \in R^{K, K}$ , where ${[Σ_{t}^{y}]}_{k, l} = E [Y_{t}^{k} Y_{t}^{l}]$ . We can directly calculate the related score function using

\nabla_{Y} log q_{t} (Y_{t}^{[K]}) = - Σ_{t}^{y} Y_{t}^{[K]} .

After training the score model $S_{θ} (Z_{t}, t)$ using augmented score matching, we can simulate the reverse-time model by running it backward and generate samples by discretizing the following reverse-time SDE (augmented reverse-time SDE) from T to 0:

d Z_{t} = {F (t) Z_{t} - G (t) G {(t)}^{T} [S_{θ} (Z_{t}, t) + \nabla_{Z_{t}} log q_{t} (Y_{t}^{[K]})]} d t + G (t) d {\bar{B}}_{t},

(18)

or the corresponding augmented PF-ODE:

d z_{t} = {F (t) z_{t} - \frac{1}{2} G (t) G {(t)}^{T} [S_{θ} (z_{t}, t) + \nabla_{z_{t}} log q_{t} (y_{t}^{[K]})]} d t .

(19)

Commonly, (18) and (19) are called the diffusion SDE and diffusion ODE, respectively. Because diffusion ODEs have no random term, one can obtain an exact solution $x_{0}$ , given $x_{T}$ , by solving the diffusion ODEs using the corresponding numerical solvers.

4.4. Sampling methods

In generative modeling,“sampling” refers to the function of generating new data points from the learned model, creating output that resembles the training data. Efficient sampling is critical for practically deploying these models in real-world applications, such as protein, image, and audio generation, where computational resources and time are oftentimes limited. For instance, although the DDPM generates high-fidelity samples, their practical utility is limited by slow sampling speeds.

Researchers have significantly advanced the study of sampling theory with diffusion models. For example, in [58], the sampling approach was extended to the denoising diffusion implicit model (DDIM) scheme. Unlike DDPMs, which rely on a stochastic reverse diffusion process, DDIMs use a deterministic sampling process and reduce the number of sampling steps from hundreds (in DDPMs) to as few as 10 to 50 steps without sacrificing image quality.

In the ScoreSDE model, the generation process is governed by simulating the trajectories of differential equations. Samplers for diffusion models typically discretize either the reverse-time SDE (Eq. (7)) or the PF-ODE (Eq. (8)). Numerical solvers for SDEs or ODEs are employed to transform random noise iteratively into realistic samples. The importance of sampling methods in diffusion models cannot be overstated, and these methods often trade between speed (sample efficiency) and accuracy (sample quality).

SDE solver When numerically simulating an SDE, one typically employs various methods, such as the Euler–Maruyama scheme, stochastic Runge–Kutta (RK) method [59] or more sophisticated schemes, such as Milstein's method [60]. Numerical solvers display varying behaviors in terms of approximation errors. Directly applying a standard numerical solver to an SDE may introduce varying degrees of error.

Song et al. [20] introduced a hybrid sampling algorithm, refining the Euler–Maruyama method using a predictor-corrector (PC) sampler, as outlined in Algorithm 1. This approach is inspired by PC methods, a class of numerical techniques commonly used for solving systems of equations [61]. The framework was designed to reduce discretization errors in reverse-time SDEs by introducing corrective steps during sample generation. The method operates in two stages per iteration:

1.
Prediction – A numerical solver (e.g., Euler–Maruyama) estimates the sample at the next time step, serving as a “predictor.”
2.
Correction – A score-based Markov chain Monte Carlo method adjusts the estimated sample's marginal distribution, acting as a “corrector.”

By iteratively refining the sample at each time step, the PC sampler improves stability and accuracy in the generative process.

Algorithm 1 — Predictor-corrector (PC) sampling.

In general, the predictor can be any numerical solver for the reverse-time SDE with a fixed discretization strategy (e.g., a reverse diffusion sampler (Eq. (46) in [20]) or an ancestral sampling (Eq. (4) in [20])). The corrector is typically Markov chain Monte Carlo based, such as Langevin dynamics or Hamiltonian Monte Carlo, and solely relies on a score function. For example, Algorithm 2 illustrates that the reverse diffusion SDE solver can be set as the predictor and annealed Langevin dynamics as the corrector, where $θ^{⁎}$ denotes the optimal parameter of the networks $s_{θ}$ and ${ϵ_{i}}_{i = 0}^{N - 1}$ denotes the step sizes for Langevin dynamics.

ODE solver The sampling of the continuous-time diffusion model can also be created by solving the corresponding PF-ODEs of the generative model because such an ODE has the remarkable property that its solution at each time t shares the same marginal distribution as the original SDE solution at that time. Because the PF-ODE is deterministic, it often enables more sophisticated and adaptive ODE solvers. These solvers can take larger steps or apply adaptive step sizing without worrying about capturing random increments at each step. This approach might result in faster numerical methods with better convergence properties, especially when scaling up to high-dimensional problems. One potential advantage of working with the PF-ODE as an alternative is its ability to enable faster sampling. Moreover, Chen et al. [62] demonstrated the polynomial-time convergence guarantees for PF-ODEs in the context of score-based generative models, emphasizing their theoretical robustness and efficiency.

5. Experimental setup and results

Explicit fractional forward dynamics The framework presented in [25] is not limited to a specific choice of drift and diffusion coefficients. The experiments focus on fractional VP (FVP) dynamics with different noise schedule values, which are given by

d X_{t} = - \frac{1}{2} β (t) X_{t} d t + \sqrt{β (t)} d {\tilde{B}}_{t}^{H}, t \in [0, T] .

Noise schedule Noise schedules (i.e., f and G in Eq. (6)) in diffusion models determine the addition and removal of noise during the diffusion process, significantly influencing sampling quality, training stability, and convergence. The correct selection of the noise schedule and steps is critical to optimizing model performance. In [21], the authors introduced linear or quadratic schedules because they are simple and intuitive to implement. However, some authors [28], [65] have criticized this approach, emphasizing that the steep decline in the initial time steps creates challenges for the neural network model during generation. Alternative noise scheduling functions with a more gradual decay have been proposed to address this problem. In particular, a cosine noise schedule [28] improves log-likelihoods compared with the conventional linear noise schedule in diffusion models. Unlike the linear schedule, the cosine schedule ensures a smoother progression of noise levels, introducing noise more gradually at the beginning and end while accelerating the transition in the middle stages. This adaptive noise scaling preserves structural information early on while maintaining efficient denoising in later steps to improve sample quality. Table 1 summarizes these two types of noise schedules. In Table 1, hyperparameters $(β_{\min}, β_{\max}) = (0.1, 20)$ , remaining consistent with the settings in [21], [25].

Table 1.

Noise schedule.

Noise Schedule	Mathematical Expression
Linear	β(t)=β_min + (β_max − β_min)t
Cosine	$β (t) = β_{\min} + (β_{\max} - β_{\min}) (\frac{1}{2} (1 - \cos (π t)))$

Open in a new tab

Architecture details for the neural networks All experiments employed the U-Net [66] architecture to evaluate the designed score function and an Adam optimizer using PyTorch's OneCycle learning rate scheduler. The U-Net architecture is structured with multiple stages, each containing three residual blocks. The architecture applies a channel multiplication strategy defined by the sequence $[1, 2, 2, 2, 2]$ , indicating that the number of channels increases progressively from the first stage, and the following stages double the number of feature channels, enhancing the ability of the network to capture increasingly complex and abstract features at deeper layers. The models were trained with a maximal learning rate of $10^{- 4}$ for 50k iterations and a batch size of 1,024. The training was conducted on a single Nvidia A6000 GPU, requiring approximately 17 h per training session. Table 2 summarizes the training hyperparameters.

Table 2.

Training hyperparameter summary.

Hyperparameter	Value
Model architecture	Conditional U-Net
Optimizer	Adam
Learning rate	10⁻⁴
Batch size	1024
Training steps	50,000
Residual blocks	3
Channel multiplication	[1,2,2,2,2]
Input size	32 × 32
Attention resolution	[4,2]
Dropout	0.1

Open in a new tab

Evaluation metrics The Fréchet inception distance (FID) is a widely used metric in generative modeling to evaluate the quality of generated samples. The FID quantifies the similarity between the distribution of the generated data and that of the real data by computing the FID between their feature representations. The mathematical formula for the FID is provided in Eq. (20):

FID : = {‖ μ_{r} - μ_{g} ‖}_{2}^{2} + Tr (Σ_{r} + Σ_{g} - 2 {(Σ_{r} Σ_{g})}^{\frac{1}{2}}),

(20)

where $μ_{g}$ and $Σ_{g}$ represent the mean and covariance of the generated distribution, and $μ_{r}$ and $Σ_{r}$ represent the mean and covariance of the distribution known to the model.

A lower FID score signifies a higher similarity between the generated and real data, indicating better generative quality. The FID provides a useful and widely adopted measure of generative model quality by capturing differences in distributional statistics. However, as a single scalar value, this metric condenses the comparison between two distributions, making it less informative regarding distinct aspects, such as fidelity (realism of generated samples) and diversity (variety of generated samples) [67]. Although the FID remains a valuable benchmark metric, it is complementary to other evaluation metrics because it does not explicitly identify one-to-one matches between distributions, can be sensitive to outliers, and depends on specific evaluation settings [69]. Combining the FID with additional measures provides a more comprehensive assessment of generative model performance. The precision and recall metric [67] and its improved version [68] were introduced as measures of fidelity and diversity. Nevertheless, the authors in [69] noted that these two evaluation techniques are unsuitable for practical application because they fail to fulfill the essential criteria for effective evaluation metrics.

To overcome the shortcomings of the FID in distinguishing between distinct aspects of generative quality, fidelity, and diversity, the authors in [69] presented the density and coverage metrics, specifically designed to provide a more detailed assessment of generative performance. By incorporating a simple yet carefully designed manifold estimation procedure, this approach enhances the empirical reliability of fidelity-diversity metrics and provides a foundation for theoretical analysis. Notice that by the assumptions in [69], it is possible to draw samples ${X_{i}}$ from a real distribution $P (X)$ and ${Y_{j}}$ from a generative model $Q (Y)$ , respectively. Let $B (x, r)$ be the sphere in $R^{D}$ around x with radius r and let N and M be the number of real and generated samples. The manifold $M (X_{1}, \dots, X_{N}) : = ⋃_{i = 1}^{N} B (X_{i}, {NND}_{k} (X_{i}))$ , where ${NND}_{k} (X_{i})$ denotes the distance from $X_{i}$ to the kth nearest neighbor among ${X_{i}}$ , excluding itself. Table 3 presents a concise overview of the listed metrics, whereas the full details of the density and coverage metrics are omitted for brevity.

Table 3.

Comparison of generative model evaluation metrics.

Metric	Mathematical expression	Explanation
Precision[68]		Measures the proportion of generated samples Y_j falling in the real data manifold $M (X_{1}, \dots, X_{N})$ . Higher precision means generated samples are realistic but do not guarantee diversity.
Recall[68]		Evaluates how well the generator covers the real data distribution by checking whether real samples X_i lie in the generated data manifold $M (Y_{1}, \dots, Y_{M})$ . Higher recall implies better diversity but does not ensure realism.
Density[69]		Quantifies how densely generated samples Y_j populate the real data space by counting how many real-sample neighborhoods contain Y_j. Higher density indicates greater sample concentration.
Coverage[69]		Measures the proportion of real samples X_i that have at least one generated sample Y_j in their neighborhood. Unlike recall, coverage is less sensitive to outliers, offering a more robust assessment of generative diversity.

Open in a new tab

When evaluating generative models, the range of values that density and coverage can take must be considered. The density is not upper bound by 1 because it considers multiple generated samples per the real data point. A higher density indicates a greater concentration of generated samples. Coverage is not normalized, but higher values are better, meaning the generated data distribution better represents the actual data. Coverage is bounded between 0 and 1, and the maximum value of 1 indicates that every real sample has at least one generated counterpart in its neighborhood. (See [68], [69] and the references therein for a comprehensive understanding of these metrics, including their mathematical definitions and implications.)

It is imperative that the evaluation metrics accurately reflect the biological relevance of the generated data. In this study, we employ FID, density and coverage metrics frequently linked to image generation but apply them to distance matrices representing protein backbone structures. It is important to acknowledge that these matrices are not arbitrary images. Instead, they are rotation- and translation-invariant representations that preserve the geometric and topological information of a protein's three-dimensional structure. In the field of structural biology, an alpha-carbon ( $C_{α}$ ) distance matrix is a comprehensive data structure that encompasses all the information necessary for the reconstruction of a protein's backbone, with the exception of rigid transformations. Consequently, the evaluation of the fidelity and diversity of these matrices constitutes a valid method of assessing the quality of protein backbone generation [70].

The FID is a measure of the global similarity between real and generated distance matrices. A low FID suggests that the generated structures have similar overall statistics to real proteins, such as spatial distributions and folding patterns. Density measures how closely the generated samples resemble the real data. High density means that the generated structures are realistic and fall within known structural regions. Coverage measures how much of the real structural space is captured by the generated samples. High coverage indicates that the model generates a diverse range of protein-like structures. Together, these metrics evaluate both fidelity (realism) and diversity (completeness), which are both crucial for effectively generating protein backbones.

Conventional metrics, including template modeling score (TM-score) and root mean square deviation (RMSD), are well-suited for the assessment of the similarity between predicted and reference 3D structures. Nevertheless, they are less appropriate for the evaluation of generative models that operate within the domain of distance matrices, as is the focus of the present study. The application of TM-score or RMSD in this context necessitates the resolution of an inverse problem, namely the reconstruction of 3D coordinates from generated distance matrices. This process is complex and distinct from the generative task under consideration. Furthermore, TM-score and RMSD are inherently pairwise metrics, evaluating the proximity to a single reference structure rather than measuring the extent to which a model approximates the full distribution of plausible protein conformations. Conversely, we adopt FID, density, and coverage as distributional metrics that directly quantify the fidelity and diversity of generated samples without requiring reconstruction. It is argued that these metrics are more aligned with the objectives of probabilistic protein generation. Nevertheless, incorporating 3D reconstruction followed by pairwise structural evaluation is an important direction for future work.

The Hurst parameter controls the temporal or spatial correlation structure of the noise. In our setting, it has a direct influence on the long-range dependencies learned in protein backbone generation. Such dependencies are common in proteins. For example, β-strands form sheets with distant partners and loop closures involve distant residues. Higher values (e.g. 0.8) introduce temporal and spatial correlations that enable the model to capture these global dependencies. Our results support this, showing that increasing the Hurst index improves coverage and density. This suggests that the generated structures more accurately reflect the global organization and diversity observed in natural proteins. Therefore, the Hurst parameter acts as a biologically meaningful regulator of the complexity of the generated backbones.

Although the present study does not comprise explicitly labeled ablation studies, targeted evaluations have been included that isolate the contributions of key components in the model. Specifically, the Hurst index is compared for different values (e.g., $H = 0.5$ vs. $H = 0.8$ ) in order to demonstrate the benefits of incorporating fractional Brownian motion. The impact of the MA-fBm approximation's resolution is also assessed by varying the number of underlying Ornstein–Uhlenbeck processes (e.g., $K = 2$ and $K = 3$ ). Furthermore, an exploration of different noise schedules (linear vs. cosine) and sampling strategies (Euler, Euler–Maruyama, and predictor-corrector) is undertaken in order to evaluate the solver's effect on generation quality. The experiments collectively highlight the importance of fractional dynamics, approximation granularity, and solver design. It is important to note that the MA-fBm construction is essential for enabling the use of fractional noise within SDE-based generative frameworks, as it recovers the Markovian structure necessary for score-based training. Conversely, other elements such as the U-Net backbone and score matching objective are standard and not directly associated with our novel contributions. Consequently, these elements are not the focus of component-isolation analysis.

Experimental results We downloaded the corresponding protein chain from http://www.pdb.org and extracted the 3D coordinate information of α-carbon ( $C_{α}$ ) atoms for each protein. The number of amino acids considered in the distance matrix is a critical parameter in experiments and influences the resolution and effectiveness of structural modeling. For computational efficiency, we consider only the first 32 amino acids. There are indeed no fundamental restrictions on the number of amino acids. However, training the model on larger sequences requires more powerful GPUs with additional memory. Additionally, larger datasets are necessary because longer sequences are more complex than shorter ones. Additionally, using fragments consisting of a small number of residues (e.g. 32) is a common strategy in protein structure modeling [71], [72]. Indeed, proteins are modular and composed of secondary structure elements, such as α-helices, β-strands and loops, which often span $10 - - 40$ residues. Modeling these fragments enables the identification of high-quality local geometries that can be used to build longer proteins. The training and sampling processes do not currently consider amino acid type, but this will be incorporated into future work.

The dataset contains $60, 000$ proteins and is divided into training (80%) and testing (20%) sets. During each experiment, 12,000 samples were sampled to compute the metrics with the testing set. The models were evaluated based on density and coverage and the FID, indicated with arrows ↑, implying that higher values are better. The best results are in bold.

Importance of modeling This section presents quantitative comparisons of model performance under various Hurst parameters $H = (0.2, 0.4, 0.5, 0.6, 0.8)$ , the number of OU processes used to approximate fractional Brownian motion $K = (2, 3)$ and the noise schedules (linear and cosine).

Initially, an evaluation was conducted of the generative performance of the FVP model under a linear noise schedule, with a subsequent comparison being made with the VP-SDE baseline. To this end, we employed the classical Euler–Maruyama scheme to solve the corresponding reverse-time SDE numerically for all models, ensuring consistency across experiments. Table 4 provides a comprehensive summary of the model's performance across various configurations.

Table 4.

Quantitative results for fractional variance-preserving dynamics with various H and K under the linear noise schedules.

	$H = 0.2$			$H = 0.4$			$H = 0.5$			$H = 0.6$			$H = 0.8$
FVP (Linear)	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓
VP (baseline)	-	-	-	-	-	-	1.043	0.884	75.368	-	-	-	-	-	-
K = 2	0.942	0.858	77.219	0.960	0.937	77.188	0.965	0.896	75.508	1.066	0.972	77.073	1.014	0.922	74.899
K = 3	0.932	0.840	77.174	0.953	0.918	77.450	1.028	0.894	75.934	1.032	0.962	77.015	1.118	0.934	74.614

Open in a new tab

The results show that as the Hurst parameter H increases, density and coverage generally improve for $K = 2$ and $K = 3$ , with greater density gains observed for larger K. The best density (1.118) is achieved for $K = 3$ at $H = 0.8$ , which highlights the enhanced sample quality under stronger fractional dependence. Coverage also improves with larger H. Furthermore, the FID consistently decreases with increasing H, achieving the lowest score of 74.614 for $K = 3$ at $H = 0.8$ , reflecting superior generative fidelity. Indeed, the Hurst parameter controls the temporal or spatial correlation structure of the noise. In our setting, it has a direct influence on the long-range dependencies learned in protein backbone generation. Such dependencies are common in proteins. For example, β-strands form sheets with distant partners and loop closures involve distant residues. Higher values (e.g. 0.8) introduce temporal and spatial correlations that enable the model to capture these global dependencies. Our results support this, showing that increasing the Hurst index improves coverage and density. This suggests that the generated structures more accurately reflect the global organization and diversity observed in natural proteins. Therefore, the Hurst parameter acts as a biologically meaningful regulator of the complexity of the generated backbones.

To facilitate the visualization of the results, bar charts and radar plots were generated for the density, coverage, and FID, as illustrated in Fig. 4, Fig. 5, respectively. A radar plot is a graphical representation of multivariate data. Each variable is represented by an axis that radiates from a central point. The data points are interconnected to form a polygon. The extent of the area encompassed by the polygon serves as an indicator of the system's overall performance. The presentation of the results for both $K = 2$ and $K = 3$ in each chart facilitates a comparative analysis of their influence on the model.

Fig. 4 — Bar plots comparing the performance of FVP models with different K across different H under the linear noise schedule.

Fig. 5 — Radar plots showing a joint representation of the density, coverage, and FID for various values of K and H under the linear noise schedule.

Importance of noise scheduler The existing literature has predominantly centered on experiments with linear noise schedules. In contrast, the present study explores an alternative nonlinear noise schedule, the cosine noise schedule, to investigate the effect of the noise scheduler on model performance. To this end, we have conducted additional experiments using a cosine noise schedule. The quantitative results corresponding to the aforementioned variables are summarized in Table 5.

Table 5.

Quantitative results for fractional variance-preserving dynamics with various H and K under the cosine noise schedules.

	$H = 0.2$			$H = 0.4$			$H = 0.5$			$H = 0.6$			$H = 0.8$
FVP (Cosine)	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓
K = 2	0.960	0.936	74.367	1.024	0.958	76.567	0.799	0.853	78.907	1.028	0.965	76.960	1.040	0.954	75.160
K = 3	0.960	0.923	73.452	1.031	0.958	76.411	0.832	0.861	79.305	1.066	0.963	76.739	1.011	0.941	75.390

Open in a new tab

In the context of the cosine noise scheduling, the optimal FID (73.452) is attained for specific values of H and K, indicating that lower Hurst values are conducive to generative fidelity under cosine noise. Density and coverage generally exhibit moderate improvements as H increases. The highest observed density (1.066) is attained at $H = 0.6, K = 3$ , while the highest observed coverage (0.965) is attained at $H = 0.6, K = 2$ . As the value of K increases from 2 to 3 in the specified setting, the FID consistently exhibits an upward trend. However, the impact of K on density and coverage is comparatively less significant in comparison to the linear noise schedule. These results suggest that while fractional dynamics still provide benefits under the cosine schedule, the optimal behavior differs from the linear case, favoring smaller H for improved FID.

To facilitate the visualization of the results, bar charts and radar plots were generated for the density, coverage, and FID, as illustrated in Fig. 6, Fig. 7, respectively.

Fig. 6 — Bar plots comparing the performance of FVP models with different K across different H under the cosine noise schedule.

Fig. 7 — Radar plots showing a joint representation of the density, coverage, and FID for various values of K and H under the cosine noise schedule.

To provide a comprehensive comparison of the two noise schedules, the generative performance is visualized using line plots for density, coverage, and FID across all values of H and K.

As Fig. 8 shows, the linear and cosine noise schedules exhibit distinct behaviors. With the linear schedule, increasing H generally results in steady improvements across all three metrics, especially at higher H values. This indicates the effective use of long-range temporal correlations. In contrast, the cosine noise schedule achieves the best FID at lower H values, particularly at $H = 0.2$ . Higher H values do not consistently yield further improvements. Trends in density and coverage under the cosine noise schedule are less monotonic and exhibit fluctuations across H, suggesting that the cosine schedule may be less effective at leveraging the full advantages of fractional dynamics.

Importance of sampling methods This section evaluates the influence of various SDE and ODE solvers on generative performance. The number of score function evaluations (NFEs) is employed to assess computational efficiency, whereas density, coverage, and the FID measure generative quality. We examined solver performance with 1,000 and 2,000 discretization steps, analyzing how the step count influences sample fidelity and diversity.

First, we considered the SDE solvers, using the classical Euler ODE as a baseline, along with the Euler–Maruyama method and PC sampler. We applied the reverse diffusion SDE as the predictor and Langevin dynamics as the corrector. Table 6 presents the results.

Table 6.

Density and coverage and the FID for stochastic differential equation (SDE) solvers.

SDE Solver
	Euler (baseline)
Iteration	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓
1000	0.908	0.847	74.395	1.118	0.934	74.614	1.224	0.965	74.432
2000	0.907	0.859	74.490	1.035	0.949	75.003	1.17	0.938	74.884

Open in a new tab

The classical Euler ODE baseline provides a reference point but lacks the stochastic flexibility for optimal generative modeling. The PC sampler at $1, 000$ steps performs best overall, with the highest density (1.224) and coverage (0.965) and the lowest FID (74.432), indicating superior generative quality. Although increasing the discretization steps to 2,000 marginally improves the coverage, it does not consistently enhance the density or FID, particularly for the PC sampler, where a slight performance degradation occurs. The Euler–Maruyama method remains computationally efficient but is outperformed by the PC method, which yields better sample diversity and fidelity at a similar computational cost. Fig. 9 displays the comparative bar plots for the density, coverage, and FID metrics across SDE solvers (Euler, Euler–Maruyama, and PC sampler) at $1, 000$ and $2, 000$ iterations.

Fig. 9 — Comparison of density, coverage, and FID metrics for the Euler ordinary differential equation solver and two stochastic differential equation solvers at two iteration counts.

After examining the influence of various SDE solvers, this work focuses on evaluating the effects of ODE solvers in the generative modeling process. We investigated how several classical ODE solvers influence sample quality, diversity, and computational efficiency. This experiment considers three numerical solvers. The Euler method provides a reference for comparison. The commonly applied fixed-step solver, fourth-order RK (RK4), is known for improved accuracy, and the adaptive-step solver RK 45 (RK45) dynamically adjusts the step size for better precision and efficiency. Table 7 summarizes the quantitative results. Table 7 reveals that the RK45 solver at 1,000 steps performs best, achieving the highest density (1.00) and coverage (0.96) and the lowest FID (73.8), indicating superior sample fidelity and diversity. Although RK4 displays degraded FID performance, the Euler method remains a computationally simple alternative but is outperformed by RK45. Increasing discretization steps marginally improves coverage but does not significantly enhance the FID, suggesting that adaptive solvers, such as RK45, balance efficiency and accuracy better than fixed-step methods. The bar plots in Fig. 10 compare the density, coverage, and FID metrics for ODE solvers (Euler, RK4, and RK45) evaluated at 1,000 and 2,000 iterations.

Table 7.

Density and coverage and the FID for ordinary differential equation solvers.

Classical ODE Solvers
	Euler (baseline)
Iteration	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓	Density ↑	Coverage ↑	FID ↓
1000	0.908	0.847	74.395	0.83	0.75	77.9	1.00	0.96	73.8
2000	0.907	0.859	74.490	0.886	0.862	74.3	0.973	0.89	74.2

Open in a new tab

Fig. 10 — Comparison of density, coverage, and the FID metrics for ordinary differential equation solvers at two iteration counts.

Discussion A thorough examination of Table 4, Table 5 indicates that an increase in the Hurst index (H) and the number of OU processes (K) generally leads to an enhancement in model performance across both noise schedules relative to the VP baseline evaluated at $H = 0.5$ . According to the findings of the study, an increase in H from 0.5 to 0.8 under the linear schedule has been shown to result in consistent improvements in all three metrics. The density increases from 1.028 to 1.118. The coverage rises from 0.894 to 0.934. The FID decreases from 75.934 to 74.614 for $K = 3$ . Analogous trends are observed for $K = 2$ , albeit to a lesser extent. These results suggest that larger H values, which correspond to stronger long-range dependencies, enhance sample fidelity and diversity under the linear schedule. It has been demonstrated that a consistent increase in K from 2 to 3 results in notable gains, particularly in the domains of FID and coverage. This finding indicates that augmenting the number of OU processes enhances the model's ability to accommodate complex distributions. According to the cosine schedule, an increase in K systematically improves FID across most settings. However, the effect of H is less straightforward. While density and coverage exhibit a modest benefit from larger H, the optimal FID (73.452) is attained at $H = 0.2$ . This finding suggests a predilection for diminished temporal correlations within the context of the cosine noise schedule. In both schedules, the combination of higher H and larger K generally exhibits superior performance in comparison to the baseline model. However, the extent and characteristics of this enhancement are predominantly influenced by the selected noise schedule.

We then proceed to a comparative analysis of the impact of the noise schedule on the generative performance of the model. In the context of the linear noise schedule, an augmentation in the Hurst parameter, denoted by H, consistently yields enhancements across all metrics of interest. This enhancement is characterized by a steady increase in density and coverage, accompanied by a concomitant decrease in the FID, particularly for higher values of the number of OU processes, designated as K. This finding indicates that linear noise effectively utilizes the long-range temporal dependencies introduced by fractional dynamics. In contrast, the cosine noise schedule exhibits a divergent pattern: the lowest FID, indicative of optimal fidelity, is observed at lower H (particularly at $H = 0.2$ ), while higher H does not result in further reductions in FID. While the density and coverage under cosine noise do exhibit enhancements with rising H, these enhancements are less stable and demonstrate fluctuations, particularly around $H = 0.5$ . The findings indicate that the impact of fractional modeling is significantly influenced by the underlying noise schedule. The authors endeavor to elucidate the fundamental cause of this phenomenon. They posit that the cosine schedule tends to impose greater structural constraints during the initial diffusion steps but subsequently introduces heightened levels of noise during subsequent stages. This discrepancy may result in suboptimal score-matching behavior, consequently impeding the model's capacity to accurately learn denoising under high noise conditions. Furthermore, the efficacy of certain samplers may be more pronounced in the context of linear noise decay, while cosine-based schedules may necessitate a more specialized adaptive step size or a meticulous solver configuration. The interaction between the choice of noise schedule and sampler remains unclear. The existence of other potential factors that could explain this outcome is a possibility that should be considered. A recent paper by [73] challenged the prevailing assumption that noise conditioning is essential for the success of denoising diffusion models. Their research findings indicated that the majority of models exhibited a graceful degradation without the need for noise conditioning, with a subset demonstrating enhanced performance. This study invites the research community to engage in a re-examination of the fundamental assumptions and formulations underpinning denoising generative models.

The findings in Table 6, Table 7 underscore the significant influence of sampling methods on generative performance. Comparing SDE and ODE solvers revealed that the PC sampler outperforms Euler-based methods, achieving the highest density (1.224) and coverage (0.965) and the lowest FID (74.432) at $1, 000$ iterations. This finding suggests that correcting steps via Langevin dynamics enhances the sample fidelity and diversity. For ODE solvers, the adaptive RK45 solver exhibits superior performance, reaching the best FID (73.8) at 1,000 iterations, outperforming the Euler and RK4 models. Although increasing the number of iterations (2,000 steps) marginally improves coverage, it does not consistently enhance the FID, indicating that an increase in the discretization steps does not consistently improve performance, with the PC sampler and RK45 methods slightly degrading at 2,000 steps, possibly due to overcorrection effects. In practice, selecting an appropriate solver for the ScoreSDE model requires balancing computational cost (measured by NFEs) with sample quality, as reflected by the density, coverage, and FID. However, the theoretical understanding of the qualitative differences between sampling from the SDE and ODE remains unclear, highlighting the necessity of further analyses.

6. Conclusions and future work

Conclusions In this work, we proposed ProT-GFDM, a generative diffusion model for protein design that integrates fractional Brownian motion via a Markov approximation framework. By expanding the framework beyond classical Brownian motion and incorporating long-range dependencies through fractional stochastic dynamics, the model addresses the key limitations of conventional diffusion-based approaches in capturing the complexity of protein structures. The formulation facilitates a more expressive generative process, enhancing both fidelity and diversity of the generated samples.

Extensive experimentation across a range of Hurst indices, noise schedules, and numerical solvers has been conducted to assess the model's performance in terms of density, coverage, and FID. The experimental results have demonstrated the model's superior performance in all three metrics. It is noteworthy that ProT-GFDM exhibits notable adaptability to both SDE and ODE solvers, and it is particularly advantageous when alternative noise schedules are employed, especially in scenarios that demand greater control over generative dynamics. These results underscore the potential of fractional stochastic modeling as a robust instrument for structured biological data generation, thereby establishing a foundation for future advancements in protein engineering and computational drug discovery.

Future work For the driving process, the choice of the number of OU processes, K, to approximate fBm depends on the data. A trade-off exists: a higher K offers a more accurate approximation of fBm but at a higher computational cost. Regarding the Hurst index, we used the classical case $H = \frac{1}{2}$ , whereas for super-diffusion ( $H > \frac{1}{2}$ ), we set $H = 0.8$ , and for subdiffusion ( $H < \frac{1}{2}$ ), we set $H = 0.2$ . The selection of the Hurst index in these experiments is based on practical considerations, as the actual value is unknown. A more precise estimation of the Hurst index could be directly derived from the data [27].

More advanced ODE solvers can also be considered. For example, the general form of diffusion ODE is as follows:

d x_{t} = {f (t) x_{t} - \frac{1}{2} g^{2} (t) s_{θ} (x_{t}, t)} d t .

(21)

The model is characterized by a semi-linear structure, comprising two primary components: a linear function of the data variable $x_{t}$ and a nonlinear function of $x_{t}$ parameterized by neural networks $s_{θ} (\hat{x}, t)$ . The solution at time t can be formulated using the variation of constants formula:

x_{t} = e^{\int_{s}^{t} f (τ) d τ} x_{s} + \int_{s}^{t} (e^{\int_{τ}^{t} f (r) d r} \frac{1}{2} g^{2} (τ) s_{θ} (x_{τ}, τ)) d τ .

We can consider the DPM solver [63] and diffusion exponential integrator sampler (DEIS) [64], which aim to solve the PF-ODE in diffusion models efficiently. These solvers share a critical similarity: both exploit the semi-linear structure of the PF-ODE to improve efficiency and accuracy. Lu et al. [74] introduced an improved version of the DPM solver with DPM-solver++, a high-order solver for the guided sampling of DPMs, to accelerate high-quality sample generation. For the DEIS model, the authors [75] proposed a modified version, called the DEIS-SN, based on a simple new score parametrization – to normalize the score estimate using its average empirical absolute value at each time step (computed from high NFE offline generations). This approach leads to consistent improvements in the FID compared to the classical DEIS. Integration of ProT-GFDM as a backbone generator upstream of models like ProteinMPNN [76] will be a key part of future work, with the aim of enabling sequence-structure co-design.

7. Notational conventions

A list of definitions for the mathematical symbols used in this work.

Symbol	Definition
[0,T]	Time horizon with terminal time T > 0
$X = {(X_{t})}_{t \in [0, T]}$	Stochastic forward process taking values in $R$
$D \in N$	Data dimension
X	Vector-valued stochastic forward process $X = {(X_{t})}_{t \in [0, T]}$ , X_t = (X_t,1,…,X_t,D)
f	Function $f : R^{D} \times [0, T] \to R^{D}$
μ,g	Functions $μ, g : [0, T] \to R$
p₀	Data distribution
p_t	Marginal density of (augmented) forward process at t ∈ [0,T]
B	Brownian motion (BM)
H	Hurst index H ∈ (0,1)
W^H	Type I fractional Brownian motion (fBm)
B^H	Type II fractional Brownian motion (fBm)
$Y^{γ} = {(Y_{t}^{γ})}_{t \in [0, T]}$	Ornstein–Uhlenbeck (OU) process with speed of mean reversion $γ \in R$
$K \in N$	Number of approximating processes
γ₁,…,γ_K	Geometrically spaced grid
ω₁,…,ω_K	Approximation coefficients
ω	Optimal approximation coefficients ω = (ω₁,…,ω_K)
${\tilde{B}}^{H}$	Markov-approximate fractional Brownian motion (MA-fBm)
k	$k \in N$ with 1 ≤ k ≤ K
Y^k	OU processes $Y^{k} = Y^{γ^{k}}$
Y¹,…,Y^K	Augmenting processes with Y^k = (Y^k,…,Y^K)
F,G	Vector-valued functions $F, G : [0, T] \to R^{D \cdot (K + 1)}$
Z	By Y¹,…,Y^K augmented forward process
Y^[K]	Stacked vector of augmenting processes
q_t	Marginal density of Y^[K] at t ∈ [0,T]
θ	Weight vector of a neural network

Open in a new tab

CRediT authorship contribution statement

Xiao Liang: Writing – review & editing, Writing – original draft, Software, Methodology, Investigation, Formal analysis, Data curation. Wentao Ma: Visualization, Software, Formal analysis. Eric Paquet: Writing – review & editing, Supervision, Methodology, Investigation. Herna Viktor: Writing – review & editing, Supervision, Investigation. Wojtek Michalowski: Writing – review & editing, Supervision, Project administration, Investigation, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could appear to have influenced the work reported in this paper.

Acknowledgements

The authors gratefully acknowledge the support of the Artificial Intelligence for Design Challenge program from the National Research Council Canada for funding this project.

References

1.Morris R., Black K.A., Stollar E.J. Uncovering protein function: from classification to complexes. Essays Biochem. 2022;66:255–285. doi: 10.1042/EBC20200108. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Luo Y. Sensing the shape of functional proteins with topology. Nat Comput Sci. 2023;3:124–125. doi: 10.1038/s43588-023-00404-7. [DOI] [PubMed] [Google Scholar]
3.Huang P.S., Boyken S.E., Baker D. The coming of age of de novo protein design. Nature. 2016;537:320. doi: 10.1038/nature19946. [DOI] [PubMed] [Google Scholar]
4.Pan X., Kortemme T. Recent advances in de novo protein design: principles, methods, and applications. J Biol Chem. 2021;296 doi: 10.1016/j.jbc.2021.100558. https://doi.org/10.1016/j.jbc.2021.100558 Available from: [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Lai B. Leveraging deep generative model for computational protein design and optimization. 2024. arXiv:2408.17241 https://doi.org/10.48550/arXiv.2408.17241 arXiv preprint. Available from:
6.Lin B., Luo X., Liu Y., Jin X. A comprehensive review and comparison of existing computational methods for protein function prediction. Brief Bioinform. 2024;25 doi: 10.1093/bib/bbae289. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Goodfellow I., Pouget-Abadie J., Mirza M., et al. Generative adversarial nets. Adv Neural Inf Process Syst. 2014;27 [Google Scholar]
8.Kingma D.P., Welling M. Auto-encoding variational Bayes. 2013. arXiv:1312.6114
9.Rezende D., Mohamed S. Int conf Mach learn PMLR. 2015. Variational inference with normalizing flows; pp. 1530–1538. [Google Scholar]
10.Anand N., Huang P. In: Bengio S., et al., editors. vol. 31. Curran Associates; 2018. Generative modeling for protein structures. (Advances in neural information processing systems). [Google Scholar]
11.Lyu S., Sowlati-Hashjin S., Garton M. ProteinVAE: variational autoencoder for translational protein design. bioRxiv. 2023 [Google Scholar]
12.Elnaggar A., Heinzinger M., Dallago C., et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell. 2021;44:7112–7127. doi: 10.1109/TPAMI.2021.3095381. [DOI] [PubMed] [Google Scholar]
13.Sevgen E., Moller J., Lange A., et al. Prot-VAE: protein transformer variational autoencoder for functional protein design. bioRxiv. 2023 [Google Scholar]
14.Repecka D., Jauniskis V., Karpus L., et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell. 2021;3:324–333. doi: 10.1038/s42256-021-00310-5. [DOI] [Google Scholar]
15.Strokach A., Kim P.M. Deep generative modeling for protein design. Curr Opin Struct Biol. 2022;72:226–236. doi: 10.1016/j.sbi.2021.11.008. [DOI] [PubMed] [Google Scholar]
16.Wang J., Cao H., Zhang J.Z., Qi Y. Computational protein design with deep learning neural networks. Sci Rep. 2018;8:1–9. doi: 10.1038/s41598-018-24760-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Qi Y., Zhang J.Z. DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with DenseNet. J Chem Inf Model. 2020;60:1245–1252. doi: 10.1021/acs.jcim.0c00043. [DOI] [PubMed] [Google Scholar]
18.Anishchenko I., et al. De novo protein design by deep network hallucination. Nature. 2021;600:547–552. doi: 10.1038/s41586-021-04184-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ferruz N., Schmidt S., Höcker B. ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun. 2022;13:4348. doi: 10.1038/s41467-022-32007-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Song Y., Sohl-Dickstein J., Kingma D.P., Kumar A., Ermon S., et al. Score-based generative modeling through stochastic differential equations. Int Conf Learn Represent. 2020 [Google Scholar]
21.Ho J., Jain A., Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inf Process Syst. 2020 [Google Scholar]
22.Song Y., Ermon S. Generative modeling by estimating gradients of the data distribution. Adv Neural Inf Process Syst. 2019;32 [Google Scholar]
23.Yoon E.B., Park K., Kim S., Lim S. Score-based generative models with Lévy processes. Adv Neural Inf Process Syst. 2023;36:40694–40707. [Google Scholar]
24.Paquet E., Soleymani F., Viktor H.L., Michalowski W. Annealed fractional Lévy–Itō diffusion models for protein generation. Comput Struct Biotechnol J. 2024 doi: 10.1016/j.csbj.2024.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Nobis G., Aversa M., Springenberg M., Detzel M., Ermon S., et al. Generative fractional diffusion models. 2023. arXiv:2310.17638
26.Harms P., Stefanovits D. Affine representations of fractional processes with applications in mathematical finance. Stoch Process Appl. 2019;129:1185–2228. doi: 10.1016/j.spa.2018.04.010. [DOI] [Google Scholar]
27.Daems R., Opper M., Crevecoeur G., Birdal T. 12th int conf learn representat. 2024. Variational inference for SDEs driven by fractional noise.https://openreview.net/forum?id=rtx8B94JMS Available from: [Google Scholar]
28.Nichol A.Q., Dhariwal P. Int conf Mach learn. 2021. Improved denoising diffusion probabilistic models; pp. 8162–8171. [Google Scholar]
29.Anderson B.D. Reverse-time diffusion equation models. Stoch Process Appl. 1982;12:313–326. [Google Scholar]
30.Kloeden P.E., Platen E. Springer; 1992. Numerical solution of stochastic differential equations. [Google Scholar]
31.Chen T.Q., Rubanova Y., Bettencourt J., Duvenaud D. Neural ordinary differential equations. Adv Neural Inf Process Syst. 2018;31:6572–6583. [Google Scholar]
32.Altschul S.F., Gish W., Miller W., et al. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
33.Rifaioglu A.S., Doğan T., Martin M.J., et al. DeepRED: automated protein function prediction with multi-task feed-forward deep neural networks. Sci Rep. 2019;9:7344. doi: 10.1038/s41598-019-43708-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Kulmanov M., Hoehndorf R. DeepGOPlus: improved protein function prediction from sequence. Bioinformatics. 2020;36:422–429. doi: 10.1093/bioinformatics/btz595. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Gligorijević V., Renfrew P.D., Kosciolek T., et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun. 2021;12:3168. doi: 10.1038/s41467-021-23303-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Hochreiter S., Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
37.Zhonghui G., Luo X., Chen J., et al. Hierarchical graph transformer with contrastive learning for protein function prediction. Bioinformatics. 2023;39 doi: 10.1093/bioinformatics/btad410. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Mostafavi S., Ray D., Warde-Farley D., et al. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008;9:1–15. doi: 10.1186/gb-2008-9-s1-s4. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Gligorijević V., Barot M., Bonneau R. DeepNF: deep network fusion for protein function prediction. Bioinformatics. 2018;34:3873–3881. doi: 10.1093/bioinformatics/bty440. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Zhou N., Jiang Y., Bergquist T.R., et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 2019;20:1–23. doi: 10.1186/s13059-019-1835-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Lin B., Luo X., Liu Y., Jin X. A comprehensive review and comparison of existing computational methods for protein function prediction. Brief Bioinform. 2024;25 doi: 10.1093/bib/bbae289. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Mardikoraem M., Wang Z., Pascual N., Woldring D. Generative models for protein sequence modeling: recent advances and future directions. Brief Bioinform. 2023;24 doi: 10.1093/bib/bbad358. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Wang J., et al. Scaffolding protein functional sites using deep learning. Science. 2022;377:387–394. doi: 10.1126/science.abn2100. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Lee J.S., Kim J., Kim P.M. Score-based generative modeling for de novo protein design. Nat Comput Sci. 2023;3:382–392. doi: 10.1038/s43588-023-00440-3. [DOI] [PubMed] [Google Scholar]
45.Bernstein F.C., Koetzle T.F., Williams G.J.B., Meyer E.F., Brice M.D., et al. The protein data bank: a computer-based archival file for macromolecular structures. Arch Biochem Biophys. 1978;185:584–591. doi: 10.1016/0003-9861(78)90204-7. [DOI] [PubMed] [Google Scholar]
46.Hamelryck T., Kent J.T., Krogh A. Sampling realistic protein conformations using local structural bias. PLoS Comput Biol. 2006;2:e131. doi: 10.1371/journal.pcbi.0020131. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Boomsma W., Mardia K.V., Taylor C.C., Ferkinghoff-Borg J., Krogh A., et al. A generative, probabilistic model of local protein structure. Proc Natl Acad Sci USA. 2008;105:8932–8937. doi: 10.1073/pnas.0801715105. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Wu J., Zhang C., Xue T., Freeman B., Tenenbaum J. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Adv Neural Inf Process Syst. 2016:82–90. [Google Scholar]
49.Rohl C.A., Strauss C.E., Misura K.M., Baker D. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66–93. doi: 10.1016/S0076-6879(04)83004-0. [DOI] [PubMed] [Google Scholar]
50.Boyd S., Parikh N., Chu E., Peleato B., Eckstein J., et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn. 2011;3:1–122. [Google Scholar]
51.Anand N., Eguchi R., Huang P.S. ICLR 2019 workshop DeepGenStruct. 2019. Fully differentiable full-atom protein backbone generation.https://openreview.net/forum?id=SJxnVL8YOV Available at. [Google Scholar]
52.Shevchenko G. Fractional Brownian motion in a nutshell. Int J Mod Phys Conf Ser. 2015;36 doi: 10.1142/S2010194515600022. [DOI] [Google Scholar]
53.Hyvärinen A., Dayan P. Estimation of non-normalized statistical models by score matching. J Mach Learn Res. 2005;6(4) [Google Scholar]
54.Martens J., Sutskever I., Swersky K. Proc 29th int conf Mach learn. 2012. Estimating the Hessian by backpropagating curvature; pp. 963–970. [Google Scholar]
55.Song Y., Garg S., Shi J., Ermon S. Conf uncertainty artif intell 204. 2019. Sliced score matching: a scalable approach to density and score estimation. [Google Scholar]
56.Vincent P. A connection between score matching and denoising autoencoders. Neural Comput. 2011;23:1661–1674. doi: 10.1162/NECO_a_00142. [DOI] [PubMed] [Google Scholar]
57.Song Y., Ermon S. Improved techniques for training score-based generative models. Adv Neural Inf Process Syst. 2020;33 [Google Scholar]
58.Song J., Meng C., Ermon S. Denoising diffusion implicit models. Int Conf Learn Represent. 2020 [Google Scholar]
59.Kloeden P.E., Platen E. Springer Science & Business Media; 2013. Numerical solution of stochastic differential equations, vol. 23. [Google Scholar]
60.Mil'shtein G.N. Approximate integration of stochastic differential equations. Theory Probab Appl. 1975;19:557. [Google Scholar]
61.Allgower E.L., Georg K. Springer Science & Business Media; 2012. Numerical continuation methods: an introduction, vol. 13. [Google Scholar]
62.Chen S., Chewi S., Lee H., Li Y., Lu J., et al. The probability flow ODE is provably fast. 2023. arXiv:2305.11798
63.Lu C., Zhou Y., Bao F., Chen J., Li C., et al. DPM-solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps. 2022. arXiv:2206.00927
64.Zhang Q., Chen Y. Fast sampling of diffusion models with exponential integrator. 2022. arXiv:2204.13902
65.Kingma D., Salimans T., Poole B., Ho J. Variational diffusion models. Adv Neural Inf Process Syst. 2021;34:21696–21707. [Google Scholar]
66.Ronneberger O., Fischer P., Brox T. Int conf med image comput comput assist interv. 2015. U-net: convolutional networks for biomedical image segmentation; pp. 234–241. [Google Scholar]
67.Sajjadi M.S., Bachem O., Lucic M., Bousquet O., Gelly S. Assessing generative models via precision and recall. Adv Neural Inf Process Syst. 2018;31 [Google Scholar]
68.Kynkäänniemi T., Karras T., Laine S., Lehtinen J., Aila T. Improved precision and recall metric for assessing generative models. Adv Neural Inf Process Syst. 2019:3929–3938. [Google Scholar]
69.Naeem M.F., Oh S.J., Uh Y., Choi Y., Yoo J. Int conf Mach learn. 2020. Reliable fidelity and diversity metrics for generative models; pp. 7176–7185. [Google Scholar]
70.Lu T., Liu M., Chen Y., Kim J., Huang P.-S. Assessing generative model coverage of protein structures with SHAPES. bioRxiv. 2025-01;2025 doi: 10.1016/j.cels.2025.101347. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Mortuza S.M., Zheng W., Zhang C., et al. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nat Commun. 2021;12:5011. doi: 10.1038/s41467-021-25316-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Bon M., Bilsland A., Bower J., McAulay K. Fragment-based drug discovery—the importance of high-quality molecule libraries. Mol Oncol. 2022;16:3761–3777. doi: 10.1002/1878-0261.13277. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Sun Q., Jiang Z., Zhao H., He K. Is noise conditioning necessary for denoising generative models? 2025. arXiv:2502.13129
74.Lu C., Zhou Y., Bao F., Chen J., Li C., et al. DPM-solver++: fast solver for guided sampling of diffusion probabilistic models. 2022. arXiv:2211.01095
75.Xia G., Danier D., Das A., Fotiadis S., Nabiei F., et al. NeurIPS workshop diffusion models. 2023. Score normalization for a faster diffusion exponential integrator sampler. [Google Scholar]
76.Dauparas J., et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science. 2022;378:49–56. doi: 10.1126/science.add2187. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0010] 1.Morris R., Black K.A., Stollar E.J. Uncovering protein function: from classification to complexes. Essays Biochem. 2022;66:255–285. doi: 10.1042/EBC20200108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0020] 2.Luo Y. Sensing the shape of functional proteins with topology. Nat Comput Sci. 2023;3:124–125. doi: 10.1038/s43588-023-00404-7. [DOI] [PubMed] [Google Scholar]

[br0030] 3.Huang P.S., Boyken S.E., Baker D. The coming of age of de novo protein design. Nature. 2016;537:320. doi: 10.1038/nature19946. [DOI] [PubMed] [Google Scholar]

[br0040] 4.Pan X., Kortemme T. Recent advances in de novo protein design: principles, methods, and applications. J Biol Chem. 2021;296 doi: 10.1016/j.jbc.2021.100558. https://doi.org/10.1016/j.jbc.2021.100558 Available from: [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0050] 5.Lai B. Leveraging deep generative model for computational protein design and optimization. 2024. arXiv:2408.17241 https://doi.org/10.48550/arXiv.2408.17241 arXiv preprint. Available from:

[br0060] 6.Lin B., Luo X., Liu Y., Jin X. A comprehensive review and comparison of existing computational methods for protein function prediction. Brief Bioinform. 2024;25 doi: 10.1093/bib/bbae289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0070] 7.Goodfellow I., Pouget-Abadie J., Mirza M., et al. Generative adversarial nets. Adv Neural Inf Process Syst. 2014;27 [Google Scholar]

[br0080] 8.Kingma D.P., Welling M. Auto-encoding variational Bayes. 2013. arXiv:1312.6114

[br0090] 9.Rezende D., Mohamed S. Int conf Mach learn PMLR. 2015. Variational inference with normalizing flows; pp. 1530–1538. [Google Scholar]

[br0100] 10.Anand N., Huang P. In: Bengio S., et al., editors. vol. 31. Curran Associates; 2018. Generative modeling for protein structures. (Advances in neural information processing systems). [Google Scholar]

[br0110] 11.Lyu S., Sowlati-Hashjin S., Garton M. ProteinVAE: variational autoencoder for translational protein design. bioRxiv. 2023 [Google Scholar]

[br0120] 12.Elnaggar A., Heinzinger M., Dallago C., et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell. 2021;44:7112–7127. doi: 10.1109/TPAMI.2021.3095381. [DOI] [PubMed] [Google Scholar]

[br0130] 13.Sevgen E., Moller J., Lange A., et al. Prot-VAE: protein transformer variational autoencoder for functional protein design. bioRxiv. 2023 [Google Scholar]

[br0140] 14.Repecka D., Jauniskis V., Karpus L., et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell. 2021;3:324–333. doi: 10.1038/s42256-021-00310-5. [DOI] [Google Scholar]

[br0150] 15.Strokach A., Kim P.M. Deep generative modeling for protein design. Curr Opin Struct Biol. 2022;72:226–236. doi: 10.1016/j.sbi.2021.11.008. [DOI] [PubMed] [Google Scholar]

[br0160] 16.Wang J., Cao H., Zhang J.Z., Qi Y. Computational protein design with deep learning neural networks. Sci Rep. 2018;8:1–9. doi: 10.1038/s41598-018-24760-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0170] 17.Qi Y., Zhang J.Z. DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with DenseNet. J Chem Inf Model. 2020;60:1245–1252. doi: 10.1021/acs.jcim.0c00043. [DOI] [PubMed] [Google Scholar]

[br0180] 18.Anishchenko I., et al. De novo protein design by deep network hallucination. Nature. 2021;600:547–552. doi: 10.1038/s41586-021-04184-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0190] 19.Ferruz N., Schmidt S., Höcker B. ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun. 2022;13:4348. doi: 10.1038/s41467-022-32007-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0200] 20.Song Y., Sohl-Dickstein J., Kingma D.P., Kumar A., Ermon S., et al. Score-based generative modeling through stochastic differential equations. Int Conf Learn Represent. 2020 [Google Scholar]

[br0210] 21.Ho J., Jain A., Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inf Process Syst. 2020 [Google Scholar]

[br0220] 22.Song Y., Ermon S. Generative modeling by estimating gradients of the data distribution. Adv Neural Inf Process Syst. 2019;32 [Google Scholar]

[br0230] 23.Yoon E.B., Park K., Kim S., Lim S. Score-based generative models with Lévy processes. Adv Neural Inf Process Syst. 2023;36:40694–40707. [Google Scholar]

[br0240] 24.Paquet E., Soleymani F., Viktor H.L., Michalowski W. Annealed fractional Lévy–Itō diffusion models for protein generation. Comput Struct Biotechnol J. 2024 doi: 10.1016/j.csbj.2024.04.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0250] 25.Nobis G., Aversa M., Springenberg M., Detzel M., Ermon S., et al. Generative fractional diffusion models. 2023. arXiv:2310.17638

[br0260] 26.Harms P., Stefanovits D. Affine representations of fractional processes with applications in mathematical finance. Stoch Process Appl. 2019;129:1185–2228. doi: 10.1016/j.spa.2018.04.010. [DOI] [Google Scholar]

[br0270] 27.Daems R., Opper M., Crevecoeur G., Birdal T. 12th int conf learn representat. 2024. Variational inference for SDEs driven by fractional noise.https://openreview.net/forum?id=rtx8B94JMS Available from: [Google Scholar]

[br0280] 28.Nichol A.Q., Dhariwal P. Int conf Mach learn. 2021. Improved denoising diffusion probabilistic models; pp. 8162–8171. [Google Scholar]

[br0290] 29.Anderson B.D. Reverse-time diffusion equation models. Stoch Process Appl. 1982;12:313–326. [Google Scholar]

[br0300] 30.Kloeden P.E., Platen E. Springer; 1992. Numerical solution of stochastic differential equations. [Google Scholar]

[br0310] 31.Chen T.Q., Rubanova Y., Bettencourt J., Duvenaud D. Neural ordinary differential equations. Adv Neural Inf Process Syst. 2018;31:6572–6583. [Google Scholar]

[br0320] 32.Altschul S.F., Gish W., Miller W., et al. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]

[br0330] 33.Rifaioglu A.S., Doğan T., Martin M.J., et al. DeepRED: automated protein function prediction with multi-task feed-forward deep neural networks. Sci Rep. 2019;9:7344. doi: 10.1038/s41598-019-43708-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0340] 34.Kulmanov M., Hoehndorf R. DeepGOPlus: improved protein function prediction from sequence. Bioinformatics. 2020;36:422–429. doi: 10.1093/bioinformatics/btz595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0350] 35.Gligorijević V., Renfrew P.D., Kosciolek T., et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun. 2021;12:3168. doi: 10.1038/s41467-021-23303-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0360] 36.Hochreiter S., Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]

[br0370] 37.Zhonghui G., Luo X., Chen J., et al. Hierarchical graph transformer with contrastive learning for protein function prediction. Bioinformatics. 2023;39 doi: 10.1093/bioinformatics/btad410. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0380] 38.Mostafavi S., Ray D., Warde-Farley D., et al. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 2008;9:1–15. doi: 10.1186/gb-2008-9-s1-s4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0390] 39.Gligorijević V., Barot M., Bonneau R. DeepNF: deep network fusion for protein function prediction. Bioinformatics. 2018;34:3873–3881. doi: 10.1093/bioinformatics/bty440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0400] 40.Zhou N., Jiang Y., Bergquist T.R., et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 2019;20:1–23. doi: 10.1186/s13059-019-1835-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0410] 41.Lin B., Luo X., Liu Y., Jin X. A comprehensive review and comparison of existing computational methods for protein function prediction. Brief Bioinform. 2024;25 doi: 10.1093/bib/bbae289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0420] 42.Mardikoraem M., Wang Z., Pascual N., Woldring D. Generative models for protein sequence modeling: recent advances and future directions. Brief Bioinform. 2023;24 doi: 10.1093/bib/bbad358. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0430] 43.Wang J., et al. Scaffolding protein functional sites using deep learning. Science. 2022;377:387–394. doi: 10.1126/science.abn2100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0440] 44.Lee J.S., Kim J., Kim P.M. Score-based generative modeling for de novo protein design. Nat Comput Sci. 2023;3:382–392. doi: 10.1038/s43588-023-00440-3. [DOI] [PubMed] [Google Scholar]

[br0450] 45.Bernstein F.C., Koetzle T.F., Williams G.J.B., Meyer E.F., Brice M.D., et al. The protein data bank: a computer-based archival file for macromolecular structures. Arch Biochem Biophys. 1978;185:584–591. doi: 10.1016/0003-9861(78)90204-7. [DOI] [PubMed] [Google Scholar]

[br0460] 46.Hamelryck T., Kent J.T., Krogh A. Sampling realistic protein conformations using local structural bias. PLoS Comput Biol. 2006;2:e131. doi: 10.1371/journal.pcbi.0020131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0470] 47.Boomsma W., Mardia K.V., Taylor C.C., Ferkinghoff-Borg J., Krogh A., et al. A generative, probabilistic model of local protein structure. Proc Natl Acad Sci USA. 2008;105:8932–8937. doi: 10.1073/pnas.0801715105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0480] 48.Wu J., Zhang C., Xue T., Freeman B., Tenenbaum J. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Adv Neural Inf Process Syst. 2016:82–90. [Google Scholar]

[br0490] 49.Rohl C.A., Strauss C.E., Misura K.M., Baker D. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66–93. doi: 10.1016/S0076-6879(04)83004-0. [DOI] [PubMed] [Google Scholar]

[br0500] 50.Boyd S., Parikh N., Chu E., Peleato B., Eckstein J., et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn. 2011;3:1–122. [Google Scholar]

[br0510] 51.Anand N., Eguchi R., Huang P.S. ICLR 2019 workshop DeepGenStruct. 2019. Fully differentiable full-atom protein backbone generation.https://openreview.net/forum?id=SJxnVL8YOV Available at. [Google Scholar]

[br0520] 52.Shevchenko G. Fractional Brownian motion in a nutshell. Int J Mod Phys Conf Ser. 2015;36 doi: 10.1142/S2010194515600022. [DOI] [Google Scholar]

[br0530] 53.Hyvärinen A., Dayan P. Estimation of non-normalized statistical models by score matching. J Mach Learn Res. 2005;6(4) [Google Scholar]

[br0540] 54.Martens J., Sutskever I., Swersky K. Proc 29th int conf Mach learn. 2012. Estimating the Hessian by backpropagating curvature; pp. 963–970. [Google Scholar]

[br0550] 55.Song Y., Garg S., Shi J., Ermon S. Conf uncertainty artif intell 204. 2019. Sliced score matching: a scalable approach to density and score estimation. [Google Scholar]

[br0560] 56.Vincent P. A connection between score matching and denoising autoencoders. Neural Comput. 2011;23:1661–1674. doi: 10.1162/NECO_a_00142. [DOI] [PubMed] [Google Scholar]

[br0570] 57.Song Y., Ermon S. Improved techniques for training score-based generative models. Adv Neural Inf Process Syst. 2020;33 [Google Scholar]

[br0580] 58.Song J., Meng C., Ermon S. Denoising diffusion implicit models. Int Conf Learn Represent. 2020 [Google Scholar]

[br0590] 59.Kloeden P.E., Platen E. Springer Science & Business Media; 2013. Numerical solution of stochastic differential equations, vol. 23. [Google Scholar]

[br0600] 60.Mil'shtein G.N. Approximate integration of stochastic differential equations. Theory Probab Appl. 1975;19:557. [Google Scholar]

[br0610] 61.Allgower E.L., Georg K. Springer Science & Business Media; 2012. Numerical continuation methods: an introduction, vol. 13. [Google Scholar]

[br0620] 62.Chen S., Chewi S., Lee H., Li Y., Lu J., et al. The probability flow ODE is provably fast. 2023. arXiv:2305.11798

[br0630] 63.Lu C., Zhou Y., Bao F., Chen J., Li C., et al. DPM-solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps. 2022. arXiv:2206.00927

[br0640] 64.Zhang Q., Chen Y. Fast sampling of diffusion models with exponential integrator. 2022. arXiv:2204.13902

[br0650] 65.Kingma D., Salimans T., Poole B., Ho J. Variational diffusion models. Adv Neural Inf Process Syst. 2021;34:21696–21707. [Google Scholar]

[br0660] 66.Ronneberger O., Fischer P., Brox T. Int conf med image comput comput assist interv. 2015. U-net: convolutional networks for biomedical image segmentation; pp. 234–241. [Google Scholar]

[br0670] 67.Sajjadi M.S., Bachem O., Lucic M., Bousquet O., Gelly S. Assessing generative models via precision and recall. Adv Neural Inf Process Syst. 2018;31 [Google Scholar]

[br0680] 68.Kynkäänniemi T., Karras T., Laine S., Lehtinen J., Aila T. Improved precision and recall metric for assessing generative models. Adv Neural Inf Process Syst. 2019:3929–3938. [Google Scholar]

[br0690] 69.Naeem M.F., Oh S.J., Uh Y., Choi Y., Yoo J. Int conf Mach learn. 2020. Reliable fidelity and diversity metrics for generative models; pp. 7176–7185. [Google Scholar]

[br0700] 70.Lu T., Liu M., Chen Y., Kim J., Huang P.-S. Assessing generative model coverage of protein structures with SHAPES. bioRxiv. 2025-01;2025 doi: 10.1016/j.cels.2025.101347. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0710] 71.Mortuza S.M., Zheng W., Zhang C., et al. Improving fragment-based ab initio protein structure assembly using low-accuracy contact-map predictions. Nat Commun. 2021;12:5011. doi: 10.1038/s41467-021-25316-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0720] 72.Bon M., Bilsland A., Bower J., McAulay K. Fragment-based drug discovery—the importance of high-quality molecule libraries. Mol Oncol. 2022;16:3761–3777. doi: 10.1002/1878-0261.13277. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0730] 73.Sun Q., Jiang Z., Zhao H., He K. Is noise conditioning necessary for denoising generative models? 2025. arXiv:2502.13129

[br0740] 74.Lu C., Zhou Y., Bao F., Chen J., Li C., et al. DPM-solver++: fast solver for guided sampling of diffusion probabilistic models. 2022. arXiv:2211.01095

[br0750] 75.Xia G., Danier D., Das A., Fotiadis S., Nabiei F., et al. NeurIPS workshop diffusion models. 2023. Score normalization for a faster diffusion exponential integrator sampler. [Google Scholar]

[br0760] 76.Dauparas J., et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science. 2022;378:49–56. doi: 10.1126/science.add2187. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

ProT-GFDM: A generative fractional diffusion model for protein generation

Xiao Liang

Wentao Ma

Eric Paquet

Herna Viktor

Wojtek Michalowski

Abstract

Graphical abstract

Highlights

1. Introduction

2. Background of the ScoreSDE framework

Remark 2.1

3. Protein backbone representation: α-carbon distance map

Fig. 1.

4. Generative fractional diffusion models

4.1. Fractional driving noise and its Markov approximation

Definition 4.1 Fractional Brownian Motion (Types I and II) —

Fig. 2.

Definition 4.2 Markov Approximation of fBm —

4.2. Fractional noise-driven generative diffusion model

Fig. 3.

4.3. Augmented score-matching technique

Remark 4.3

4.4. Sampling methods

Algorithm 1.

Algorithm 2.

5. Experimental setup and results

Table 1.

Table 2.

Table 3.

Table 4.

Fig. 4.

Fig. 5.

Table 5.

Fig. 6.

Fig. 7.

Fig. 8.

Table 6.

Fig. 9.

Table 7.

Fig. 10.

6. Conclusions and future work

7. Notational conventions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases