Skip to main content
. Author manuscript; available in PMC: 2024 Apr 4.
Published in final edited form as: Nat Rev Bioeng. 2023 Oct 27;2(2):136–154. doi: 10.1038/s44222-023-00114-9

Table 3 |.

Applications of diffusion models in bioinformatics

Applications Tool name Key function and strength Input Output Diffusion targeta
Protein design and generation ProteinSGM91 Inpaints plausible backbones/domains; generates native-like structures; allows precise and modular design Inter-residue 6D feature maps Full-atomistic structure Inter-residue 6D feature maps
FoldingDiff101 Mirrors native folding process; alleviates the need for equivariant networks; unconditionally generates realistic protein structures 6 consecutive backbone angles Protein backbone structure 6 consecutive backbone angles
DiffSDS102 Reduces computational complexity and cost; efficiently imposes geometric constraints; outperforms previous strong baselines 6 consecutive backbone angles Masked protein backbone structure 6 consecutive backbone angles
ProSSDG84 Operates at large scales; generates realistic proteins structures with sequences; allows interactive structure generation Secondary structure; coarse constraints All-atomistic protein structure Coarse constraints
Genie103 Dual representation for protein residues; designability and diversity Oriented reference frames Protein backbone structure Oriented reference frames
SMCDiff104 Efficiently samples scaffolds; samples conditioned on given motif; theoretically guarantees conditional samples Molecular graph structure Scaffold structure given input motif Molecular graph structure
RFdiffusion105 Generates diverse outputs; can be guided toward specific design objectives; explicitly models 3D structure Sequence, predicted structure Diverse, complex, functional protein RF frames from a predicted structure
FrameDiff106 Generates designable monomers and diverse protein backbones; does not require pretrained structure predictor Molecular graph structure Designable monomer backbone structure Molecular graph structure
Chroma107 Jointly models structures and sequences; sub-quadratic computational scaling; arbitrary conditional sampling Protein graph structure Proteins with desired functions Protein graph structure
Small-molecule generation and drug design CDGS108 Incorporates discrete graph structures; specialized graph noise prediction model; similarity-constrained molecule optimization pipeline Graph structures and inherent features Molecular graphs Discrete graph structure
EDM54 Equivariant to Euclidean transformations; operates on continuous and categorical features; admits likelihood computation Atom coordinates, atom types 3D molecular graphs Coordinates and categorical features
DiffBridge109 Incorporates prior information; generates realistic molecules; uses physically informed diffusion bridges Masked points in 3D Euclidean space Realistic molecules or point cloud Molecular graph structure
DGSM110 Models local and long-range interactions; dynamically constructs graph structures; estimates gradient fields of logarithm density 2D molecular graphs Stable 3D conformations Molecular graph structure
SDEGen111 Captures multimodal conformation distribution; quickly searches low-energy conformations and higher efficiency Small molecules Representative conformations Encoding of graph structure
DiffMD112 No intermediate variables; directly estimates gradient of log density; incorporates directions and velocities of atomic motions 3D coordinates, velocities, and invariant features Molecule simulation trajectories Atomic position coordinates
DiffLinker113 Links arbitrary number of fragments; generates diverse and synthetically accessible molecules; conditions on protein pockets Molecule structure A molecule incorporating all the input fragments Molecular graph structure
Protein–ligand interaction modelling DiffBP114 Non-autoregressive generation; generates molecules with high affinity to target proteins and desirable drug properties Protein–ligand structure Protein–ligand structure given input protein Protein graph structure
DiffSBDD115 Generates diverse drug-like ligands; efficient in silico experiments; uses experimentally determined binding data Protein–ligand structure High-affinity ligands given protein pockets Protein graph structure
DiffDock116 Maps manifold to product space; provides confidence estimates; maintains precision on computationally folded structures Protein–ligand structure Ranked ligand poses, confidence scores Ligand poses
NeuralPLexer117 Repacks failed AlphaFold2 sites; enables end-to-end design; generalizes to ligand-unbound or predicted protein structure inputs Protein backbone template and ligand molecular graphs Full-atom protein–ligand structure Contact maps, geometry prior
NERE118 Benchmarks on protein–ligand and antibody–antigen dataset; outperforms other unsupervised methods Protein–ligand complex Binding affinity of the protein–ligand structure Ligand atom coordinates
Cryo-EM data analysis CryoDRGN98 Accurate data distribution sampling; fast latent space traversal; unlocks generative modelling tools Single-particle cryo-EM imaging High-quality 3D structure Latent space of image embeds
Single-cell image and gene-expression analysis DISPR99 Realistic 3D reconstructions; data augmentation tool in single-cell classification task; inverse biomedical problems 2D microscopy images as a prior 3D cell shape predictions 3D microscopy point cloud
DEWAKSS100 Maintains cellular identity; preserves data variance; maintains cluster homogeneity Gene-expression data Single-cell genomics data Gene-expression matrix
a

The term ‘diffusion’ refers to the application of the forward and reverse diffusion processes on a specific data representation, that is, the target. Cryo-EM, cryogenic electron microscopy. RF, RosettaFold.