Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2023 Feb 15:arXiv:2302.04313v2. Originally published 2023 Feb 8. [Version 2]

Geometry-Complete Diffusion for 3D Molecule Generation

Alex Morehead 1, Jianlin Cheng 1
PMCID: PMC9934735  PMID: 36798459

Abstract

Denoising diffusion probabilistic models (DDPMs) (Ho et al. (2020)) have recently taken the field of generative modeling by storm, pioneering new state-of-the-art results in disciplines such as computer vision and computational biology for diverse tasks ranging from text-guided image generation (Ramesh et al. (2022); Saharia et al. (2022); Rombach et al. (2022)) to structure-guided protein design (Ingraham et al. (2022); Watson et al. (2022)). Along this latter line of research, methods such as those of Hoogeboom et al. (2022) have been proposed for unconditionally generating 3D molecules using equivariant graph neural networks (GNNs) within a DDPM framework. Toward this end, we propose GCDM, a geometry-complete diffusion model that achieves new state-of-the-art results for 3D molecule diffusion generation by leveraging the representation learning strengths offered by GNNs that perform geometry-complete message-passing. Our results with GCDM also offer preliminary insights into how physical inductive biases impact the generative dynamics of molecular DDPMs. The source code, data, and instructions to train new models or reproduce our results are freely available at https://github.com/BioinfoMachineLearning/Bio-Diffusion.

1. Introduction

Generative modeling has recently been experiencing a renaissance in modeling efforts driven largely by denoising diffusion probabilistic models (DDPMs). At a high level, DDPMs are trained by learning how to denoise a noisy version of an input example. For example, in the context of computer vision, Gaussian noise may be successively added to an input image, and, with the goals of a DDPM in mind, we would desire for a generative model of images to learn how to successfully distinguish between the original input image’s feature signal and the noise signal added to the image thereafter. If a model can achieve such outcomes, we can use the model to generate novel images by first sampling multivariate Gaussian noise and then iteratively removing from the current state of the image the noise predicted by our model. This classic formulation of DDPMs has achieved significant results in the space of image generation (Rombach et al. (2022)), audio synthesis (Kong et al. (2020)), and even meta-learning by learning how to conditionally generate neural network checkpoints (Peebles et al. (2022)). Furthermore, such an approach to generative modeling has expanded its reach to encompass scientific disciplines such as computational biology (Anand & Achim (2022)), computational chemistry (Xu et al. (2022)), and even computational physics (Mudur & Finkbeiner (2022)).

Concurrently, the field of geometric deep learning (GDL) (Bronstein et al. (2021)) has seen a sizeable increase in research interest lately, driven largely by theoretical advances within the discipline (Joshi et al. (2023)) as well as by novel applications of such methodology (Stärk et al. (2022)). Notably, such applications even include what is considered by many researchers to be a solution to the problem of predicting 3D protein structures from their corresponding amino acid sequences (Jumper et al. (2021)). Such an outcome arose, in part, from recent advances in sequence-based language modeling efforts (Vaswani et al. (2017)) as well as from innovations in equivariant neural network modeling (Thomas et al. (2018)).

With such diverse, successful use cases of DDPMs and GDL in mind, in this work, we explore the intersection of geometric graph representation learning and DDPMs to answer the following questions.

  • What is the impact of geometric representation learning on DDPMs designed to generate molecular data?

  • What are the limitations of current equivariant graph neural networks empowering contemporary molecular DDPMs?

  • What role do physical inductive biases play within the generative denoising of molecular DDPMs?

2. Related Work

Generative Modeling.

The field of deep generative modeling (Ruthotto & Haber (2021)) has pioneered a variety of techniques by which to train deep neural networks to create new content similar to that of an existing data repository (e.g., a text dataset of English sentences). Language models such as GPT-3 and ChatGPT (Brown et al. (2020); Schulman et al. (2022)) have become known as hallmark examples of successful generative modeling of text data. In the domains of computer vision and computational biology, techniques such as latent diffusion (Rombach et al. (2022)) and equivariant graph diffusion (Luo et al. (2022)) have established some of the latest state-of-the-art results in generative modeling of images and biomolecules such as proteins, respectively.

Geometric Deep Learning.

Data residing in a geometric or physical space (e.g., 3) can be processed by machine learning algorithms in a plethora of ways. However, in recent years, the field of geometric deep learning has become known for its proficiency in introducing powerful new deep learning methods designed specifically to process geometric data efficiently (Cao et al. (2020)). Examples of popular GDL algorithms include convolutional neural networks designed for working with image data (LeCun et al. (1995)), recurrent neural networks for processing sequence-based data (Medsker & Jain (1999)), and graph neural networks for handling graph-structured model inputs (Zhou et al. (2020)).

Equivariant Neural Networks.

To process geometric data efficiently, however, recent GDL research (Cohen & Welling (2016); Bronstein et al. (2021); Bulusu et al. (2021)) has specifically shown that designing one’s machine learning algorithm to be equivariant to the symmetry groups the input data points naturally respect (e.g., 3D rotation symmetries) often helps such an algorithm generalize to datasets beyond those used for its cross-validation (e.g., training and testing datasets). As a particularly relevant example of a neural network that is equivariant to several important and common symmetry groups of geometric data, equivariant graph neural networks (Fuchs et al. (2020); Satorras et al. (2021b); Kofinas et al. (2021); Morehead & Cheng (2022)) that are translation and rotation equivariant to inputs residing in 3 have become known as hallmark examples of geometric deep learning algorithms that generalize remarkably well to new inputs and require notably fewer training iterations to converge.

Representation Learning of Scientific Data.

Scientific data, in particular, requires careful consideration in the context of representation learning. As much scientific data contains within it a notion of geometry or latent structure, equivariance has become a key algorithmic component for processing such inputs as well (Han et al. (2022)). Moreover, equivariant graph representation learning algorithms have recently become a de facto methodology for processing scientific data of many shapes and origins Musaelian et al. (2022); Batzner et al. (2022).

Contributions.

In this work, we connect ideas at the forefront of GDL and generative modeling to advance the state-of-the-art (SOTA) for 3D molecule generation. In detail, we provide the following contributions.

  • We introduce the Geometry-Complete Diffusion Model (GCDM) which establishes new SOTA results for unconditional 3D molecule generation.

  • We investigate the impact of geometric message-passing on the behavior and performance of DDPMs trained to generate 3D molecular data.

  • Our experiments demonstrate the importance of incorporating physical inductive biases within DDPM denoising neural networks when training them on data from physical domains.

3. Preliminaries

3.1. Diffusion Models

Key to understanding our contributions in this work are denoising diffusion probabilistic models. As alluded to previously, once trained, DDPMs can generate new data of arbitrary shapes, sizes, formats, and geometries by learning to reverse a noising process acting on each model input. More precisely, for a given data point x, a diffusion process adds noise to x for time step t = 0, 1, …, T to yield zt, a noisy representation of the input x at time step t. Such a process is defined by a multivariate Gaussian distribution:

q(ztx)=𝒩(ztαtxt,σt2I), (1)

where αt+ regulates how much feature signal is retained and σt2 modulates how much feature noise is added to input x. Note that we typically model α as a function defined with smooth transitions from α0 = 1 to αT = 0, where a special case of such a noising process, the variance preserving process (Sohl-Dickstein et al. (2015); Ho et al. (2020)), is defined by αt=1σt2. To simplify notation, in this work, we define the feature signal-to-noise ratio as SNR(t)=αt2/σt2. Also interesting to note is that this diffusion process is Markovian in nature, indicating that we have transition distributions as follows:

q(ztzs)=𝒩(ztαtszs,σts2I), (2)

for all t > s with αt|s = αts and σts2=σt2αts2σs2. In total, then, we can write the noising process as:

q(z0,z1,,zTx)=q(z0x)t=1Tq(ztzt1). (3)

If we then define μts(x, zt) and σts as

μts(x,zt)=αtsσs2σt2zt+αsσts2σt2xandσts=σtsσsσt,

we have that the inverse of the noising process, the true denoising process, is given by the posterior of the transitions conditioned on x, a process that is also Gaussian:

q(zsx,zt)=𝒩(zsμts(x,zt),σtsI). (4)

The Generative Denoising Process.

In diffusion models, we define the generative process according to the true denoising process. However, for such a denoising process, we do not know the value of x a priori, so we typically approximate it as x^=ϕ(zt,t) using a neural network ϕ. Doing so then lets us express the generative transition distribution p(zs|zt) as q(zsx^(zt,t),zt). As a practical alternative to Eq. 4, we can represent this expression using our approximation for x^:

p(zszt)=𝒩(zsμts(x^,zt),σts2I). (5)

If we choose to define s as s = t − 1, then we can derive the variational lower bound on the log-likelihood of x given our generative model as:

logp(x)0+base+t=1Tt, (6)

where we note that 0=logp(xz0) models the likelihood of the data given its noisy representation z0, base=KL(q(zTx)p(zT)) models the difference between a standard normal distribution and the final latent variable q(zT|x), and

t=KL(q(zsx,zt)p(zszt))fort=1,2,,T.

Note that, in this formation of diffusion models, our neural network ϕ directly predicts x^. However, Ho et al. (2020) and others have found optimization of ϕ to be made much easier when instead predicting the Gaussian noise added to x to create x^. An intuition for how this changes the neural network’s learning dynamics is that, when predicting back the noise added to the model’s input, the network is being trained to more directly differentiate which part of zt corresponds to the input’s feature signal (i.e., the underlying data point x) and which part corresponds to added feature noise. In doing so, if we let zt = αtx + σtϵ, our neural network can then predict ϵ^=ϕ(zt,t) such that:

x^=(1/αt)zt(σt/αt)ϵ^. (7)

Kingma et al. (2021) and others have since shown that, when parametrizing our denoising neural network in this way, the loss term t reduces to:

t=Eϵ~𝒩(0,I)[12(1SNR(t1)/SNR(t))ϵϵ^2] (8)

Note that, in practice, the loss term base should be close to zero when using a noising schedule defined such that αT ≈ 0. Moreover, if and when α0 ≈ 1 and x is a discrete value, we will find 0 to be close to zero as well.

3.2. SE(3) Equivariance

In this work, we will consider designing denoising neural networks, here denoted as f, that are equivariant to the action of the special Euclidean group (i.e., SE(3)). We say that a function f is equivariant to the action of a group G if, for all gG, we have Tg(f(x)) = f(Sg(x)), where Tg and Sg are linear representations associated with the group element g (Serre et al. (1977)). Given that we are considering the SE(3) group, which is generated by 3D translations and rotations, we can represent Tg and Sg by a translation t and an orthogonal matrix R that rotates coordinates. Then we consider f to be equivariant to a rotation R if transforming its input x yields an equivalent transformation of its output, that is, we have Rf(x) = f(Rx).

Diffusion Models and Equivariant Distributions.

In this work, we desire for the marginal distribution p(x) of our denoising neural network to be an invariant distribution. We begin by observing that a conditional distribution p(y|x) is equivariant to the action of 3D rotations by meeting the criterion:

p(yx)=p(RyRx)forallorthogonalR. (9)

Moreover, a distribution is invariant to rotation transformations R when

p(y)=p(Ry)forallorthogonalR. (10)

As Köhler et al. (2020) and Xu et al. (2022) have collectively demonstrated, we know that if p(zT) is invariant and the neural network we use to parametrize p(zt−1|zt) is equivariant, we have, as desired, that the marginal distribution p(x) of our denoising model is an invariant distribution.

SE(3)-Equivariant Points and Features.

In the context of our work, we represent a point cloud as a fully-connected 3D graph 𝒢=(𝒱,) with X=(x1,x2,,xN)N×3 as the respective Cartesian coordinates for each node, where N=|𝒱| and E=||. Each node is described by scalar features HN×h and m vector-valued features χN×(m×3). Likewise, each edge is described by scalar features EE×e and x vector-valued features ξE×(x×3). Important to note is that the features H and E are invariant to rotations, reflections, and translations, whereas the features χ and ξ are equivariant to rotations and reflections. In particular, we describe a function Φ as SE(3)-equivariant if it satisfies the following constraint:

Definition 3.1.

(SE(3) Equivariance).

Given (H′, E′, X′, χ′, ξ′) = Φ(H, E, X, χ, ξ),

we have (H,E,QXT+g,QχT,QξT)=Φ(H,E,QXT+g,QχT,QξT), QSO(3),g3×1.

Geometry-Complete Perceptron Networks.

GCPNets are a type of geometric Graph Neural Network that satisfies our SE(3) equivariance constraint (3.1). In this setting, with (hiH, χiχ, eijE, ξijξ), GCPNet consists of a composition of Geometry-Complete Graph Convolution (GCPConv) layers (hil,χil),xil=GCPConv[(hil1,χil1),(eij0,ξij0),xil1,ij] which are defined as:

nil=ϕl(nil1,𝒜j𝒩(i)Ωωl(nil1,njl1,eij,ij)), (11)

where nil=(hil,χil);eij=(eij0,ξij0); ϕ is a trainable function; l signifies the representation depth of the network; 𝒜 is a permutation-invariant aggregation function; Ωω represents a message-passing function corresponding to the ω-th GCP message-passing layer; and 𝓕ijt=(aijt,bijt,cijt), with aijt=xitxjtxitxjt, bijt=xit×xjtxit×xjt and cijt=aijt×bijt, respectively.

Lastly, if one desires to update the positions of each node in 𝒢, as we do in the context of 3D molecule generation, GCPConv provides a simple, SE(3)-equivariant method to do so using a dedicated GCP module as follows:

(hpil,χpil)=GCPpl(nil,ij) (12)
xil=xil1+χpil,whereχpil1×3, (13)

where GCP.l,ij is defined as in (Morehead & Cheng (2022)) to provide rotation and translation invariant updates to hi and rotation equivariant updates to χi following centralization of the input point cloud’s coordinates X (Du et al. (2022)). The effect of using feature updates to χi to update xi is, after decentralizing X following the final GCPConv layer, that updates to xi then become SE(3)-equivariant. As such, all the transformations described above collectively satisfy the required equivariance constraint in Def. 3.1. Important to note is that, (1) GCPNet performs message passing directly using vector-valued features corresponding to nodes and edges instead of performing insufficient approximations of such geometric quantities using only scalar features, and (2) GCPNet incorporates a biophysical inductive bias concerning reflection symmetries (e.g., molecular chirality) into the network architecture by encoding into the network’s updates to scalar and vector-valued features geometric frames that are not reflection equivariant. For a more detailed description of the subcomponents within GCPNet, we refer interested readers to Morehead & Cheng (2022).

4. GCDM: A Geometry-Complete Diffusion Model

In this section, we describe GCDM, a new Geometry-Complete SE(3)-Equivariant Diffusion Model, which is illustrated in Figure 1. In particular, we describe how GCDM defines a noising process jointly on equivariant node positions X and invariant node features H and learns a generative denoising process using GCPNet.

Figure 1:

Figure 1:

A framework overview for our proposed Geometry-Complete Diffusion Model (GCDM). Our framework consists of (i.) a graph (topology) definition process, (ii.) a GCPNet-based graph neural network for 3D graph representation learning, (iii.) denoising of 3D input graphs using GCPNet, and (iv.) application of a trained GCPNet denoising network for 3D molecule generation. Zoom in for the best viewing experience.

4.1. The Diffusion Process

In this work, we define an equivariant diffusion process for equivariant node coordinates xi and invariant node features hi, one that adds random noise to such input data. Recall that the graph inputs to our model, 𝒢, associate with each node a coordinate representation xi3 and a feature vector hh. Subsequently, let [·, ·] denote the concatenation of two variables. We then define our equivariant noising process on latent variables zt=[zt(x),zt(h)] as:

q(ztx,h)=𝒩xh(ztαt[x,h],σt2I)fort=1,2,,T, (14)

where we use 𝒩xh as concise notation to denote the product of two distributions. In this context, the former distribution represents the noised node coordinates 𝒩x, and the latter distribution represents the noised node features 𝒩h. Being as such, 𝒩xh is given by:

𝒩x(zt(x)αtx,σt2I)𝒩h(zt(h)αth,σt2I). (15)

Note that, with the context of a standard diffusion model in mind, these two equations, Eqs. 14 and 15, correspond to Eq. 1. To address the translation invariance issue raised by Satorras et al. (2021a) in the context of handling a distribution over 3D coordinates, we adopt the zero center of gravity trick proposed by Xu et al. (2022) to define 𝒩x as a normal distribution on the subspace defined by ixi=0. However, to handle node features hi that are rotation, reflection, and translation-invariant, we can instead use a conventional normal distribution 𝒩.

GCDM Generative Denoising Process.

Recall that we need to address the noise posteriors to define a generative process for GCDM. In a similar manner as in Eq. 4, we can directly use the noise posteriors q(zs|x, h, zt) of Eq. 14. To do so, we must replace the input variables x and h with the approximations x^ and h^ predicted by our denoising neural network:

p(zszt)=𝒩xh(zsμts([x^,h^],zt),σts2I), (16)

where our values for x^ and h^ depend on zt, t, and our denoising neural network ϕ. As mentioned previously, it is often easier to optimize a diffusion model using a noise parametrization to predict the noise ϵ^ In this work, we use such a parametrization to predict ϵ^=[ϵ^(x),ϵ^(h)], which represents the noise individually added to x^ and h^. We can then use the predicted ϵ^ to derive:

[x^,h^]=zt/αtϵ^tσt/αt. (17)

Observe that rotating zt with R yields Rϵ^=(ϕ(Rzt,t)). Moreover, since the mean of the denoising distribution (16), one that uses isotropic noise, rotates as Rx^=Rzt(x)/αtRϵ^t(x)σt/αt, the distribution is equivariant.

Sampling from the model involves sampling zT~𝒩(0,I) and then iteratively sampling zt ~ p(zt−1|zt) for t = T, T − 1, …, 1. Lastly, we can sample x, h ~ p(x, h|z0). For a high-level overview of the sampling algorithm for diffusion models such as GCDM, we refer interested readers to Hoogeboom et al. (2022).

GCDM Optimization Objective.

When we use the noise parametrization referred to in Eq. 8, the likelihood term of our model, t=KL(q(zsx,zt)p(zs,zt)), notably simplifies. Similar to Hoogeboom et al. (2022), with this parametrization, we observe that t reduces to

t=Eϵt~𝒩xh(0,I)[12w(t)ϵtϵ^t2], (18)

where ϵ^t=ϕ(zt,t) and w(t) = (1 −SNR(t − 1)/SNR(t)). Following Ho et al. (2020), Hoogeboom et al. (2022), and others, practically speaking, we set w(t) = 1 to stabilize training and improve sample quality. For a high-level overview of the optimization algorithm for diffusion models such as GCDM, we refer interested readers once again to Hoogeboom et al. (2022).

To summarize, we have defined a diffusion process, a denoising model, and an optimization objective function between them. We now need to define the neural network model ϕ that we will reside within the denoising model.

4.2. Equivariant Dynamics

We use our previous definition of GCPNet in Section 3.2 to learn an SE(3)-equivariant dynamics function [ϵ^(x),ϵ^(h)]=ϕ(zt(x),zt(h),t) as:

ϵ^t(x),ϵ^t(h)=GCPNet(zt(x),[zt(h),t/T])[zt(x),0], (19)

where we inform our denoising model of the current time step by concatenating t/T as an additional node feature and where we subtract the coordinate representation outputs of GCPNet from its coordinate representation inputs after subtracting from the coordinate representation outputs their collective center of gravity. With the parametrization in Eq. 17, we have subsequently achieved rotation equivariance on x^i.

4.3. Zeroth Likelihood Terms

For the zeroth likelihood terms corresponding to each type of input feature, we adopt the respective terms previously derived by Hoogeboom et al. (2022). In particular, for integer node features, we adopt the zeroth likelihood term:

p(hz0(h))=h12h+12𝒩(uz0(h),σ0)du, (20)

where we use the CDF of a standard normal distribution, Φ, to compute Eq. 20 as Φ((h+12z0(h))/σ0)Φ((h12z0(h))/σ0)1 for reasonable noise parameters α0 and σ0. For categorical node features, we instead use the zeroth likelihood term:

p(hz0(h))=C(hp),p1121+12𝒩(uz0(h),σ0)du, (21)

where we normalize p to sum to one and where C is a categorical distribution. Lastly, for continuous node positions, we adopt the zeroth likelihood term:

p(xz0(x))=𝒩(xz0(x)/α0σ0/α0ϵ^0,σ02/α02I) (22)

which gives rise to the log-likelihood component 0(x) as:

0(x)=Eϵ(x)~𝒩x(0,I)[logZ112ϵxϕ(x)(z0,0)2], (23)

where d = 3 and the normalization constant Z=(2πσ0/α0)(N1)d - in particular, its (N−1)·d term - arises from the zero center of gravity trick mentioned in Section 4.1.

Scaling Node Features.

In line with Hoogeboom et al. (2022), to improve the log-likelihood of our model’s generated samples, we find it useful to train and perform sampling with our model using scaled node feature inputs as [x,14h(categorical),110h(integer)].

Deriving The Number of Atoms.

Finally, to determine the number of atoms with which our model will generate a 3D molecule, we first sample N ~ p(N), where p(N) denotes the categorical distribution of molecule sizes over the model’s training dataset. Then, we conclude by sampling x, h ~ p(x, h|N).

5. Experiments

5.1. Unconditional 3D Molecule Generation - QM9

The QM9 dataset (Ramakrishnan et al. (2014)) contains molecular property descriptions and 3D atom coordinates for 130k small molecules. Each molecule in QM9 can contain up to 9 heavy atoms, that is, 29 atoms when including hydrogens. For the task of 3D molecule generation, we train GCDM to unconditionally generate molecules by producing atom types (H, C, N, O, and F), integer-valued atom charges, and 3D coordinates for each of the molecules’ atoms. Following Anderson et al. (2019), we split QM9 into training, validation, and test partitions consisting of 100k, 18k, and 13k molecule examples, respectively.

Metrics.

We adopt the scoring conventions of Satorras et al. (2021a) by using the distance between atom pairs and their respective atom types to predict bond types (single, double, triple, or none). Subsequently, we measure the proportion of generated atoms that have the right valency (atom stability) and the proportion of generated molecules for which all atoms are stable (molecule stability).

Baselines.

We compare GCDM to three existing E(3)-equivariant models: G-Schnet (Gebauer et al. (2019)), Equivariant Normalizing Flows (E-NF) (Satorras et al. (2021a)), and Equivariant Diffusion Models (EDM) (Hoogeboom et al. (2022)). For each of these three models, we report their results as reported in Hoogeboom et al. (2022). For comparison with models for this task that are not equivariant, we also report results from Hoogeboom et al. (2022) for Graph Diffusion Models (GDM) trained with random data rotations (GDM-aug) and without them (GDM). To the best of our knowledge, the force-guided molecule generation methods of Wu et al. (2022) are the most recent and performant state-of-the-art methods for 3D molecule generation, so we include their results for this experiment as well.

We further include two GCDM ablation models to more closely analyze the impact of certain key model components within GCDM. These two ablation models include GCDM without local geometric frames ij (i.e., GCDM w/o Frames) and GCDM without scalar message attention (SMA) applied to each edge message (i.e., GCDM w/o SMA) as mij = eijmij, where mij represents the scalar messages learned by GCPNet during geometric graph message passing and eij represents a 1 if an edge exists between nodes i and j (and 0 otherwise) via eijϕinf(mij). Here, ϕinf:e[0,1]1 resembles a linear layer followed by a sigmoid function Satorras et al. (2021b). All GCDM models use 9 GCPConv layers; SiLU activations (Elfwing et al. (2018)); 256 and 64 scalar node and edge hidden features, respectively; and 32 and 16 vector-valued node and edge features, respectively. All GCDM models are also trained using the AdamW optimizer (Loshchilov & Hutter (2017)) with a batch size of 64, a learning rate of 10−4, and a weight decay rate of 10−12.

Results.

In Table 1, we see that GCDM outperforms previous methods (E-NF, G-Schnet, EDM, Bridge, and Bridge + Force) as well as their non-equivariant counterparts (GDM and GDM-aug) for all metrics, except concerning Atom stable (%) compared to Bridge + Force. We note that the variance of this metric for Bridge + Force is notably larger than that of GCDM. Importantly, when evaluating this metric in conjunction with Mol stable (%), we see that GCDM generates a significantly larger proportion of realistic and stable molecules (negative log-likelihood & Mol stable (%)) compared to all other methods. It is specifically interesting to note how much lower the negative log-likelihood (NLL) of GCDM is compared to that of EDM, the previous NLL-based SOTA method for 3D molecule generation, indicating the generative distribution that GCDM learns from QM9 likely contains much sharper peaks compared to EDM.

Table 1:

Comparison of GCPNet with baseline methods for 3D molecule generation. The results are reported in terms of each method’s negative log-likelihood (NLL) - log p(x, h, N), atom stability, and molecule stability with standard deviations across three runs on QM9, each drawing 10,000 samples from the model. The top-1 (best) results for this task are in bold, and the second-best results are underlined.

Type Method NLL ↓ Atoms Stable (%) ↑ Mol Stable (%) ↑

Normalizing Flow E-NF −59.7 85.0 4.9

Graph Autoregression G-Schnet n/a 95.7 68.1

DDPM GDM −94.7 97.0 63.2
GDM-aug −92.5 97.6 71.6
EDM −110.7 ± 1.5 98.7 ± 0.1 82.0 ± 0.4
Bridge 98.7 ± 0.1 81.8 ± 0.2
Bridge + Force 98.8 ± 0.1 84.6 ± 0.3

DDPM - Ours GCDM w/o Frames −162.3 ± 0.3 98.4 ± 0.0 81.7 ± 0.5
GCDM w/o SMA −131.3 ± 0.8 95.7 ± 0.1 51.7 ± 1.4
GCDM −171.0 ± 0.2 98.7 ± 0.0 85.7 ± 0.4

Data 99.0 95.2

To offer additional insights into the behavior of each method for 3D molecule generation, we report as additional metrics the validity of a generated molecule as determined by RDKit (Landrum et al. (2013)) and the uniqueness of the generated molecules overall. Note that for G-Schnet, EDM, Bridge, and Bridge + Force, we directly derive the bonds from the distance between atom pairs. We see in Table 2 that GCDM generates the highest percentage of valid and unique molecules compared to all other methods, improving upon previous SOTA results in such measures by a notable margin. A possible explanation for why GCDM can achieve such results over equivariant methods such as EDM is that GCDM performs geometric (and geometry-complete) message-passing over each 3D input graph to remove the noise present therein, whereas other methods operate solely on scalar features. In future work, it would be interesting to investigate the impact of performing different kinds of geometric message-passing (e.g., type-2 tensor message-passing) on the performance of diffusion models for tasks such as 3D molecule generation.

Table 2:

Comparison of GCPNet with baseline methods for 3D molecule generation. The results are reported in terms of the validity and uniqueness of 10,000 samples generated by each method, with standard deviations across three runs on QM9. The best results for this task are in bold, and the second-best results are underlined.

Type Method Valid (%) ↑ Valid and Unique (%) ↑

Normalizing Flow E-NF 40.2 39.4

Graph Autoregression G-Schnet 85.5 80.3

DDPM GDM-aug 90.4 89.5
EDM 91.9 ± 0.5 90.7 ± 0.6
Bridge 90.2
Bridge + Force 90.7

DDPM - Ours GCDM w/o Frames 93.9 ± 0.1 92.7 ± 0.1
GCDM w/o SMA 83.1 ± 1.7 82.8 ± 1.7
GCDM 94.8 ± 0.2 93.3 ± 0.0

Data 97.7 97.7

5.2. Conditional 3D Molecule Generation - QM9

Baselines.

Towards conditional generation of 3D molecules, we compare GCDM to an existing E(3)-equivariant model, EDM (Hoogeboom et al. (2022)), as well as to two naive baselines: ”Naive (Upper-bound)” where a property classifier ϕc predicts molecular properties given a method’s generated 3D molecules and shuffled (i.e., random) property labels; and ”# Atoms” where one uses the numbers of atoms in a method’s generated 3D molecules to predict their molecular properties. For each baseline method, we report its mean absolute error in terms of molecular property prediction by an EGNN classifier ϕc Satorras et al. (2021b) as reported in Hoogeboom et al. (2022). For GCDM, we train each conditional model by conditioning it on one of five distinct molecular properties - α, gap, homo, lumo, and μ - using the QM9 validation split of Hoogeboom et al. (2022) as the model’s training dataset and the QM9 training split of Hoogeboom et al. (2022) as the corresponding EGNN classifier’s training dataset. Consequently, one can expect the gap between a method’s performance and that of ”QM9 (Lower-bound)” to decrease as the method generates molecules that more accurately model a given molecular property.

Results.

We see in Table 3 that GCDM outperforms all other methods in conditioning on a given molecular property. In particular, GCDM improves upon the mean absolute error of the SOTA EDM method for the properties α, gap, homo, lumo, and μ by 28%, 8%, 3%, 18%, and 24%, respectively, demonstrating that, using geometry-complete message passing, GCDM can more accurately model important molecular properties for 3D molecule generation.

Table 3:

Comparison of GCPNet with baseline methods for property-conditional 3D molecule generation. The results are reported in terms of the mean absolute error for molecular property prediction by an EGNN classifier ϕc on a QM9 subset, GCDM-generated samples, and two different baselines ”Naive (Upper-bound)” and ”# Atoms”. The top-1 (best) results for this task are in bold, and the second-best results are underlined.

Task α Δϵ ϵHOMO ϵLUMO μ
Units Bohr 3 meV meV meV D

Naive (Upper-bound) 9.01 1470 645 1457 1.616
# Atoms 3.86 866 426 813 1.053
EDM 2.76 655 356 584 1.111
GCDM 1.97 602 344 479 0.844
QM9 (Lower-bound) 0.10 64 39 36 0.043

6. Conclusion

In this work, we introduced GCDM, an SE(3)-equivariant geometry-complete denoising diffusion probabilistic model for 3D molecule generation. While previous equivariant methods for this task have had difficulty establishing sizeable performance gains over non-equivariant methods for this task, GCDM establishes a clear performance advantage over all other methods, generating more realistic, stable, valid, unique, and property-specific 3D molecules compared to existing approaches.

Acknowledgments

This work is partially supported by two NSF grants (DBI1759934 and IIS1763246), two NIH grants (R01GM093123 and R01GM146340), three DOE grants (DE-AR0001213, DE-SC0020400, and DE-SC0021303), and the computing allocation on the Summit compute cluster provided by the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05- 00OR22725, granted in part by the Advanced Scientific Computing Research (ASCR) Leadership Computing Challenge (ALCC) program.

References

  1. Anand Namrata and Achim Tudor. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. arXiv preprint arXiv:2205.15019, 2022. [Google Scholar]
  2. Anderson Brandon, Hy Truong Son, and Kondor Risi. Cormorant: Covariant molecular neural networks. Advances in neural information processing systems, 32, 2019. [Google Scholar]
  3. Batzner Simon, Musaelian Albert, Sun Lixin, Geiger Mario, Mailoa Jonathan P, Kornbluth Mordechai, Molinari Nicola, Smidt Tess E, and Kozinsky Boris. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1): 2453, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bronstein Michael M, Bruna Joan, Cohen Taco, and Veličković Petar. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021. [Google Scholar]
  5. Brown Tom, Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared D, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. [Google Scholar]
  6. Bulusu Srinath, Favoni Matteo, Ipp Andreas, Müller David I, and Schuh Daniel. Generalization capabilities of translationally equivariant neural networks. Physical Review D, 104(7):074504, 2021. [Google Scholar]
  7. Cao Wenming, Yan Zhiyue, He Zhiquan, and He Zhihai. A comprehensive survey on geometric deep learning. IEEE Access, 8:35929–35949, 2020. [Google Scholar]
  8. Cohen Taco and Welling Max. Group equivariant convolutional networks. In International conference on machine learning, pp. 2990–2999. PMLR, 2016. [Google Scholar]
  9. Du Weitao, Zhang He, Du Yuanqi, Meng Qi, Chen Wei, Zheng Nanning, Shao Bin, and Liu Tie-Yan. Se (3) equivariant graph neural networks with complete local frames. In International Conference on Machine Learning, pp. 5583–5608. PMLR, 2022. [Google Scholar]
  10. Elfwing Stefan, Uchibe Eiji, and Doya Kenji. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks, 107:3–11, 2018. [DOI] [PubMed] [Google Scholar]
  11. Fuchs Fabian, Worrall Daniel, Fischer Volker, and Welling Max. Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in Neural Information Processing Systems, 33:1970–1981, 2020. [Google Scholar]
  12. Gebauer Niklas, Gastegger Michael, and Schütt Kristof. Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. Advances in neural information processing systems, 32, 2019. [Google Scholar]
  13. Han Jiaqi, Rong Yu, Xu Tingyang, and Huang Wenbing. Geometrically equivariant graph neural networks: A survey. arXiv preprint arXiv:2202.07230, 2022. [Google Scholar]
  14. Ho Jonathan, Jain Ajay, and Abbeel Pieter. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020. [Google Scholar]
  15. Hoogeboom Emiel, Satorras Vıctor Garcia, Vignac Clément, and Welling Max. Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pp. 8867–8887. PMLR, 2022. [Google Scholar]
  16. Ingraham John, Baranov Max, Costello Zak, Frappier Vincent, Ismail Ahmed, Tie Shan, Wang Wujie, Xue Vincent, Obermeyer Fritz, Beam Andrew, et al. Illuminating protein space with a programmable generative model. bioRxiv, pp. 2022–12, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Joshi Chaitanya K, Bodnar Cristian, Mathis Simon V, Cohen Taco, and Liò Pietro. On the expressive power of geometric graph neural networks. arXiv preprint arXiv:2301.09308, 2023. [Google Scholar]
  18. Jumper John, Evans Richard, Pritzel Alexander, Green Tim, Figurnov Michael, Ronneberger Olaf, Tunyasuvunakool Kathryn, Bates Russ, Žídek Augustin, Potapenko Anna, et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kingma Diederik, Salimans Tim, Poole Ben, and Ho Jonathan. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021. [Google Scholar]
  20. Kofinas Miltiadis, Nagaraja Naveen, and Gavves Efstratios. Roto-translated local coordinate frames for interacting dynamical systems. Advances in Neural Information Processing Systems, 34:6417–6429, 2021. [Google Scholar]
  21. Köhler Jonas, Klein Leon, and Noé Frank. Equivariant flows: exact likelihood generative learning for symmetric densities. In International conference on machine learning, pp. 5361–5370. PMLR, 2020. [Google Scholar]
  22. Kong Zhifeng, Ping Wei, Huang Jiaji, Zhao Kexin, and Catanzaro Bryan. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020. [Google Scholar]
  23. Landrum Greg et al. Rdkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum, 8, 2013. [Google Scholar]
  24. Yann LeCun Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995. [Google Scholar]
  25. Loshchilov Ilya and Hutter Frank. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. [Google Scholar]
  26. Luo Shitong, Su Yufeng, Peng Xingang, Wang Sheng, Peng Jian, and Ma Jianzhu. Antigen-specific antibody design and optimization with diffusion-based generative models. bioRxiv, pp. 2022–07, 2022. [Google Scholar]
  27. Medsker Larry and Jain Lakhmi C. Recurrent neural networks: design and applications. CRC press, 1999. [Google Scholar]
  28. Morehead Alex and Cheng Jianlin. Geometry-complete perceptron networks for 3d molecular graphs. arXiv preprint arXiv:2211.02504, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Mudur Nayantara and Finkbeiner Douglas P. Can denoising diffusion probabilistic models generate realistic astrophysical fields? arXiv preprint arXiv:2211.12444, 2022. [Google Scholar]
  30. Musaelian Albert, Batzner Simon, Johansson Anders, Sun Lixin, Owen Cameron J, Kornbluth Mordechai, and Kozinsky Boris. Learning local equivariant representations for large-scale atomistic dynamics. arXiv preprint arXiv:2204.05249, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Peebles William, Radosavovic Ilija, Brooks Tim, Efros Alexei A, and Malik Jitendra. Learning to learn with generative models of neural network checkpoints. arXiv preprint arXiv:2209.12892, 2022. [Google Scholar]
  32. Ramakrishnan Raghunathan, Dral Pavlo O, Rupp Matthias, and Von Lilienfeld O Anatole. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Ramesh Aditya, Dhariwal Prafulla, Nichol Alex, Chu Casey, and Chen Mark. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022. [Google Scholar]
  34. Rombach Robin, Blattmann Andreas, Lorenz Dominik, Esser Patrick, and Ommer Björn. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022. [Google Scholar]
  35. Ruthotto Lars and Haber Eldad. An introduction to deep generative modeling. GAMM-Mitteilungen, 44(2):e202100008, 2021. [Google Scholar]
  36. Saharia Chitwan, Chan William, Saxena Saurabh, Li Lala, Whang Jay, Denton Emily, Ghasemipour Seyed Kamyar Seyed, Ayan Burcu Karagol, Mahdavi S Sara, Lopes Rapha Gontijo, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022. [Google Scholar]
  37. Satorras Victor Garcia, Hoogeboom Emiel, Fuchs Fabian B, Posner Ingmar, and Welling Max. E (n) equivariant normalizing flows. arXiv preprint arXiv:2105.09016, 2021a. [Google Scholar]
  38. Satorras Vıctor Garcia, Hoogeboom Emiel, and Welling Max. E (n) equivariant graph neural networks. In International conference on machine learning, pp. 9323–9332. PMLR, 2021b. [Google Scholar]
  39. Schulman J, Zoph B, Kim C, Hilton J, Menick J, Weng J, Uribe JFC, Fedus L, Metz L, Pokorny M, et al. Chatgpt: Optimizing language models for dialogue, 2022.
  40. Serre Jean-Pierre et al. Linear representations of finite groups, volume 42. Springer, 1977. [Google Scholar]
  41. Sohl-Dickstein Jascha, Weiss Eric, Maheswaranathan Niru, and Ganguli Surya. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265. PMLR, 2015. [Google Scholar]
  42. Stärk Hannes, Ganea Octavian, Pattanaik Lagnajit, Barzilay Regina, and Jaakkola Tommi. Equibind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning, pp. 20503–20521. PMLR, 2022. [Google Scholar]
  43. Thomas Nathaniel, Smidt Tess, Kearnes Steven, Yang Lusann, Li Li, Kohlhoff Kai, and Riley Patrick. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018. [Google Scholar]
  44. Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, and Polosukhin Illia. Attention is all you need. Advances in neural information processing systems, 30, 2017. [Google Scholar]
  45. Watson Joseph L, Juergens David, Bennett Nathaniel R, Trippe Brian L, Yim Jason, Eisenach Helen E, Ahern Woody, Borst Andrew J, Ragotte Robert J, Milles Lukas F, et al. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models. bioRxiv, pp. 2022–12, 2022. [Google Scholar]
  46. Wu Lemeng, Gong Chengyue, Liu Xingchao, Ye Mao, and Liu Qiang. Diffusion-based molecule generation with informative prior bridges. arXiv preprint arXiv:2209.00865, 2022. [Google Scholar]
  47. Xu Minkai, Yu Lantao, Song Yang, Shi Chence, Ermon Stefano, and Tang Jian. Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022. [Google Scholar]
  48. Zhou Jie, Cui Ganqu, Hu Shengding, Zhang Zhengyan, Yang Cheng, Liu Zhiyuan, Wang Lifeng, Li Changcheng, and Sun Maosong. Graph neural networks: A review of methods and applications. AI open, 1:57–81, 2020. [Google Scholar]

Articles from ArXiv are provided here courtesy of arXiv

RESOURCES