Table 3 |.
Applications of diffusion models in bioinformatics
Applications | Tool name | Key function and strength | Input | Output | Diffusion targeta |
---|---|---|---|---|---|
Protein design and generation | ProteinSGM91 | Inpaints plausible backbones/domains; generates native-like structures; allows precise and modular design | Inter-residue 6D feature maps | Full-atomistic structure | Inter-residue 6D feature maps |
FoldingDiff101 | Mirrors native folding process; alleviates the need for equivariant networks; unconditionally generates realistic protein structures | 6 consecutive backbone angles | Protein backbone structure | 6 consecutive backbone angles | |
DiffSDS102 | Reduces computational complexity and cost; efficiently imposes geometric constraints; outperforms previous strong baselines | 6 consecutive backbone angles | Masked protein backbone structure | 6 consecutive backbone angles | |
ProSSDG84 | Operates at large scales; generates realistic proteins structures with sequences; allows interactive structure generation | Secondary structure; coarse constraints | All-atomistic protein structure | Coarse constraints | |
Genie103 | Dual representation for protein residues; designability and diversity | Oriented reference frames | Protein backbone structure | Oriented reference frames | |
SMCDiff104 | Efficiently samples scaffolds; samples conditioned on given motif; theoretically guarantees conditional samples | Molecular graph structure | Scaffold structure given input motif | Molecular graph structure | |
RFdiffusion105 | Generates diverse outputs; can be guided toward specific design objectives; explicitly models 3D structure | Sequence, predicted structure | Diverse, complex, functional protein | RF frames from a predicted structure | |
FrameDiff106 | Generates designable monomers and diverse protein backbones; does not require pretrained structure predictor | Molecular graph structure | Designable monomer backbone structure | Molecular graph structure | |
Chroma107 | Jointly models structures and sequences; sub-quadratic computational scaling; arbitrary conditional sampling | Protein graph structure | Proteins with desired functions | Protein graph structure | |
Small-molecule generation and drug design | CDGS108 | Incorporates discrete graph structures; specialized graph noise prediction model; similarity-constrained molecule optimization pipeline | Graph structures and inherent features | Molecular graphs | Discrete graph structure |
EDM54 | Equivariant to Euclidean transformations; operates on continuous and categorical features; admits likelihood computation | Atom coordinates, atom types | 3D molecular graphs | Coordinates and categorical features | |
DiffBridge109 | Incorporates prior information; generates realistic molecules; uses physically informed diffusion bridges | Masked points in 3D Euclidean space | Realistic molecules or point cloud | Molecular graph structure | |
DGSM110 | Models local and long-range interactions; dynamically constructs graph structures; estimates gradient fields of logarithm density | 2D molecular graphs | Stable 3D conformations | Molecular graph structure | |
SDEGen111 | Captures multimodal conformation distribution; quickly searches low-energy conformations and higher efficiency | Small molecules | Representative conformations | Encoding of graph structure | |
DiffMD112 | No intermediate variables; directly estimates gradient of log density; incorporates directions and velocities of atomic motions | 3D coordinates, velocities, and invariant features | Molecule simulation trajectories | Atomic position coordinates | |
DiffLinker113 | Links arbitrary number of fragments; generates diverse and synthetically accessible molecules; conditions on protein pockets | Molecule structure | A molecule incorporating all the input fragments | Molecular graph structure | |
Protein–ligand interaction modelling | DiffBP114 | Non-autoregressive generation; generates molecules with high affinity to target proteins and desirable drug properties | Protein–ligand structure | Protein–ligand structure given input protein | Protein graph structure |
DiffSBDD115 | Generates diverse drug-like ligands; efficient in silico experiments; uses experimentally determined binding data | Protein–ligand structure | High-affinity ligands given protein pockets | Protein graph structure | |
DiffDock116 | Maps manifold to product space; provides confidence estimates; maintains precision on computationally folded structures | Protein–ligand structure | Ranked ligand poses, confidence scores | Ligand poses | |
NeuralPLexer117 | Repacks failed AlphaFold2 sites; enables end-to-end design; generalizes to ligand-unbound or predicted protein structure inputs | Protein backbone template and ligand molecular graphs | Full-atom protein–ligand structure | Contact maps, geometry prior | |
NERE118 | Benchmarks on protein–ligand and antibody–antigen dataset; outperforms other unsupervised methods | Protein–ligand complex | Binding affinity of the protein–ligand structure | Ligand atom coordinates | |
Cryo-EM data analysis | CryoDRGN98 | Accurate data distribution sampling; fast latent space traversal; unlocks generative modelling tools | Single-particle cryo-EM imaging | High-quality 3D structure | Latent space of image embeds |
Single-cell image and gene-expression analysis | DISPR99 | Realistic 3D reconstructions; data augmentation tool in single-cell classification task; inverse biomedical problems | 2D microscopy images as a prior | 3D cell shape predictions | 3D microscopy point cloud |
DEWAKSS100 | Maintains cellular identity; preserves data variance; maintains cluster homogeneity | Gene-expression data | Single-cell genomics data | Gene-expression matrix |
The term ‘diffusion’ refers to the application of the forward and reverse diffusion processes on a specific data representation, that is, the target. Cryo-EM, cryogenic electron microscopy. RF, RosettaFold.