Skip to main content
Nature Portfolio logoLink to Nature Portfolio
. 2026 Feb 18;652(8108):240–250. doi: 10.1038/s41586-026-10141-2

Accurate predictions of disordered protein ensembles with STARLING

Borna Novak 1,2,3,#, Jeffrey M Lotthammer 2,3,#, Ryan J Emenecker 2,3, Alex S Holehouse 2,3,
PMCID: PMC13043300  PMID: 41708867

Abstract

Intrinsically disordered proteins and regions (collectively IDRs) are found across all kingdoms of life and have critical roles in virtually every eukaryotic cellular process1. IDRs exist in a broad ensemble of structurally distinct conformations. This structural plasticity facilitates diverse molecular recognition and function24. Here we combine advances in physics-based force fields with the power of multi-modal generative deep learning to develop STARLING, a framework for rapid generation of accurate IDR ensembles and ensemble-aware representations from sequence. STARLING supports environmental conditioning across ionic strengths and demonstrates proof of concept for the interpolative ability of generative models beyond their training domain. Moreover, we enable ensemble refinement under experimental constraints using a Bayesian maximum-entropy reweighting scheme. Beyond ensemble characterization, STARLING sequence representations can be used in multiple ways. We showcase two examples: first, STARLING lets us perform ensemble-based search for ‘biophysical look-alikes’. Second, we demonstrate how these latent representations can be used to accelerate ensemble-first sequence design from weeks or hours per candidate to seconds, enabling library-scale designs. Together, STARLING dramatically lowers the barrier to the computational interrogation of IDR function through the lens of emergent biophysical properties, complementing bioinformatic protein sequence analysis. We evaluate the accuracy of STARLING against extant experimental data and offer a series of vignettes illustrating how STARLING can enable rapid hypothesis generation for IDR function and aid the interpretation of experimental data.

Subject terms: Intrinsically disordered proteins, Computational biophysics


The deep learning model STARLING can generate accurate ensembles of intrinsically disordered regions of proteins using only protein sequence as input.

Main

IDRs are structurally heterogeneous protein regions that are estimated to constitute approximately 30% of eukaryotic proteomes1. Despite lacking a fixed structure, IDRs have key roles in essential cellular processes such as transcription, translation and cell signalling1. Owing to their broad structural heterogeneity, IDRs must be described by a conformational ensemble: a large collection of structurally distinct and interchanging conformations1. However, although IDRs cannot be represented as a single 3D structure, they do still possess sequence-encoded conformational biases, and ensembles can have essential roles in IDR function and may be perturbed in disease24. Just as structural biology has been instrumental in understanding the molecular basis for folded domain function, there is a growing appreciation that the characterization of IDR ensembles may be important for understanding IDR function5,6.

Various experimental techniques have been applied to interrogate sequence–ensemble relationships79. Although these report on specific aspects of IDR ensembles, they fall short of providing a holistic description of the distribution of conformers (that is, the 3D coordinates of all residues in the protein across many different conformations, referred to here as a ‘full structural ensemble’). To achieve this, the integration of computational modelling with experimental data has proven an effective route1013.

Conceptually, computational models and experiments can be combined in two different ways. One approach involves using physics-based models and reweighting or biasing towards experimental observables14,15. Another involves using experimental data to parameterize transferable force fields, which, in principle, do not require additional reweighting16. Although both approaches have been effective in providing insight into IDR behaviour, they require deep technical expertise to ensure reliable conclusions are drawn, and can also be computationally expensive.

Although recent advances in coarse-grained simulations1719 offer a faster alternative (and both modalities described above can be applied here), even coarse-grained simulations can still take hours to obtain sufficient sampling and still require a relatively high level of technical expertise to set up, run and analyse. Recent deep learning predictors trained on coarse-grained simulations have enabled proteome-scale predictions for ensemble average values, but are limited to specific observables for which a predictor was trained (for example, radius of gyration (Rg) or radius of end-to-end distances (Re))20,21.

Deep learning approaches have transformed protein structure prediction, significantly reducing the barrier to large-scale exploration of sequence–structure relationships22 (Fig. 1a). However, these methods are poorly suited to investigate IDRs23 owing to a reduction in alignment-based conservation in disordered proteins, a paucity of appropriate experimental training data and optimization for an inappropriate objective (predicting a single best structure from sequence, whereas IDRs should be described by a large, conformationally heterogeneous ensemble; Fig. 1b). In short, although we now possess easy-to-use tools for accurately predicting the 3D structure of folded proteins, equivalent tools for rapid and accurate IDR ensemble prediction are lacking.

Fig. 1. STARLING approach and model architecture.

Fig. 1

a, Deep learning has revolutionized protein structure prediction, with major advances being facilitated by large-scale evolutionary information. b, Structure prediction methods for folded domains are poorly suited for predicting the behaviour of IDRs. These limitations stem from the absence of a native-state structure and because evolutionary information is often poorly captured in multiple sequence alignments (MSAs) of IDRs. c, Generative text-to-image models enable the creation of many unique and independent images consistent with a single input prompt. d, IDR ensemble generation shares many similarities with text-to-image generation; we required many distinct, uncorrelated conformers, all of which are consistent with an input prompt (an amino acid sequence). e, STARLING was trained on approximately 50,000 amino acid sequences at 150 mM ionic strength and approximately 14,000 sequences at 20 mM and 300 mM ionic strength. The sequences performed at 20 mM and 300 mM ionic strength are a subset of those simulated at 150 mM ionic strength. For each sequence, hundreds of distinct conformers were generated using coarse-grained molecular dynamics simulations, and each conformer was converted into a distance map (an image). Sequences were split into training, testing and validation sets. The MARV cartoon was reproduced from ref. 61, Martin Steinegger. f, STARLING makes use of a VAE to compress distance maps to a latent space, allowing a denoising diffusion model to work in this latent space (‘latent diffusion’). g, The overall architecture of the STARLING model combines a latent-space probabilistic denoising diffusion model with a vision transformer architecture using both convolutions and transformer blocks. Latent-space maps were decoded to real space via the VAE decoder. Finally, distance maps can be reconstructed into 3D coordinates via a parallelized multidimensional scaling approach.

Here we addressed these challenges by developing a fast and accurate approach for predicting full coarse-grained disordered protein structural ensembles directly from the amino acid sequence. Our approach leveraged advances in generative modelling, a deep learning technique capable of creating new and original data. However, developing a generative model poses a key challenge: the need for large training datasets. To address this, we performed large-scale coarse-grained simulations to generate full structural ensembles across tens of thousands of natural and synthetic IDRs. The resulting method — STARLING — allowed us to generate structural ensembles directly from sequence in seconds. A major goal in developing STARLING was to avoid hardware barriers. Although STARLING is fast (approximately 35 conformers per second) on GPUs, it can still generate ensembles in minutes on Intel/AMD CPUs and seconds on Apple CPUs.

STARLING-generated ensembles show good agreement with experimental data, enabling de novo exploration of uncharacterized IDRs or aiding in the biophysical interpretation of experimental data. Moreover, here we show proof of concept that STARLING can be used to (1) investigate sequence–ensemble relationships for disordered proteins; (2) explore bound-state conformational ensembles for binary disordered protein complexes; and (3) provide conformation-aware latent representations for IDR characterization, search and design. Together, we propose that the ease of use and speed of STARLING make it a powerful tool for democratizing large-scale exploration of sequence–ensemble relationships in IDRs.

Generative artificial intelligence (AI) has been transformative for text-to-image generation24,25. In text-to-image generative AI, an image is generated by passing a prompt (a short phrase describing the desired image) to a pre-trained deep learning model. The model then generates an image consistent with the prompt, a process referred to as inference. Deep learning models capable of inference must first be trained. For models used in modern text-to-image generative AI, training is not simply memorization but also learning the relationship between prompts and the features in the associated images. As a result, once a model is trained, if the same prompt is reused many times, it will generate many independent images entirely distinct from any of the individual instances observed in training. Of note, despite being different from one another, each generated image should be consistent with the prompt (Fig. 1c).

The mapping of a single text prompt to a collection of different images — each of which is consistent with the input prompt — is precisely the problem that we wished to solve for IDR ensemble generation. In IDR ensemble generation, we wanted to take a text prompt (amino acid sequence) and generate a collection of many distinct and uncorrelated IDR conformations consistent with that prompt (Fig. 1d). Moreover, we wanted this generation process to be fast (seconds) and possible on commodity hardware (laptops and desktops). To achieve this goal, we combined a variational autoencoder (VAE) with a discrete-time denoising-diffusion probabilistic model (DDPM) to create a latent diffusion model25. The resulting method (STARLING) enabled the accurate and rapid prediction of coarse-grained conformational ensembles of IDRs.

STARLING was trained on approximately 50,000 unique rationally designed and naturally occurring IDR sequences (10–384 residues) simulated at 150 mM ionic strength (Supplementary Table 1). Among these, approximately 14,000 were additionally simulated at 20 mM and 300 mM ionic strength (Fig. 1e, Extended Data Fig. 1a,b and Supplementary Table 2). Rationally designed sequences were designed using GOOSE26 (see Methods; Extended Data Fig. 1c–f). Although naturally occurring IDRs provide training data focused on the most relevant manifold of sequences, the inclusion of a large number of rationally designed sequences allowed us to construct training data that systematically titrates across sequence space, yielding a well-rounded training set that captures the extremes of sequence composition and patterning.

Extended Data Fig. 1. Sequences used in STARLING training, test, and validation.

Extended Data Fig. 1

a, Our dataset of 70,563 consists of both natural sequences (N = 20,349) and synthetic sequences designed by varying sequence features known to affect IDR ensembles (N = 50,214). b, The sequences were clustered and split into train (N = 49,423), validation (N = 10,703), and test (N = 10,437) sets while maintaining the distribution of sequence lengths across the splits consistent (right). c, Residues used for STARLING training, test, and validation, represented in terms of aliphatic vs. polar content. d, Residues used for STARLING training, test, and validation, represented in terms of net charge vs. hydrophobicity. e, Residues used for STARLING training, test, and validation, represented in terms of kappa (charge patterning) vs. fraction of charged residues. f, Residues used for STARLING training, test, and validation, represented in terms of aromatic vs. positively charged residues. g, Overall sequence complexity as a function of amino acid content. We note that the data in panels C-G mirror our prior quantification of sequences20, albeit with an expanded dataset. h, Histogram of the length distribution of naturally occurring IDRs across 12 model organisms. i, A cumulative density function of the length distribution is presented in a. 94% of the IDRs across common model organisms fall between 0 and 384 residues (the length cutoff of STARLING).

A necessary decision in deep learning models is defining the limits of your training data. We focused on sequences up to 384 amino acids in length for several reasons. First, almost 95% of naturally occurring IDRs across common model organisms are shorter than 384 amino acids (Extended Data Fig. 1h,i). Second, this length enables well-sampled coarse-grained simulations to be completed on a reasonable timescale. Third, the underlying architecture of the model requires a fixed upper limit (see Methods). Finally, experimental characterization of disordered protein sequences longer than approximately 350 residues is lacking, making it difficult to assess the validity of model predictions relative to experiments for very long disordered proteins.

Molecular dynamics simulations were performed using the Mpipi-GG force field. Mpipi-GG20 — a variant of the original Mpipi force field17 — is a one-bead-per-residue coarse-grained model developed for disordered regions. Once run, conformations from simulations of a sequence of length n were converted into distance maps, n × n matrices, where each element describes the distance between the i-th and j-th residue for a specific conformation. This converts each IDR conformation into an ‘image’, allowing us to directly leverage innovations developed for conditional image generation. Overall, our training dataset comprised nearly 12 million distance maps.

Two core limitations of DDPM models are their substantial memory requirements and the slow generative process during inference. These limitations arise from performing the reverse denoising process in a complex, high-dimensional space (for example, images of size 384 × 384 pixels). To mitigate this, we developed a VAE to compress each distance map into a lower-resolution latent space (24 × 24 pixels)27 (Fig. 1f). The denoising diffusion process can then occur in this latent space (latent diffusion), significantly reducing memory requirements and inference time. Given this, STARLING was trained in two independent stages.

We first trained a highly accurate VAE that enables the compression of full-resolution distance maps into latent space (Extended Data Fig. 2a). Our VAE uses a ResNet18 architecture and learns parameters for an encoder (full resolution to latent space) and a decoder (latent space to full resolution)28. We assessed the accuracy of VAE by encoding and decoding distance maps derived from sequences substantially different from those used in the training and validation sets. We evaluated our model on a held-out test set comprising approximately 16,000 simulations, totalling nearly 2.5 million distance maps. The model achieved a root-mean-square reconstruction error (RMSE) of 1.16 Å (Extended Data Fig. 2b). Furthermore, our model accurately reconstructed bond lengths, with a root-mean-square reconstruction error of 0.18 Å (Extended Data Fig. 2c), crucial for modelling protein conformations.

Extended Data Fig. 2. VAE overview, performance, and overall detailed model architecture.

Extended Data Fig. 2

a, Schematic representation of the variational autoencoder (VAE). The input to our variational autoencoder is a collection of 2D inter-residue distance maps that undergo feature extraction and spatial compression through a ResNet18 encoder, which learns the μ and σ parameters of a normal distribution that parameterizes the latent features. The VAE is trained using an evidence-based lower bound (ELBO) loss that comprises a reconstruction term and a Kullback–Leibler (KL) divergence term. b, The variational autoencoder (VAE) is evaluated on a held-out test set of distance maps (N = 2,422,839). The VAE encoder spatially compresses these distance maps, which the VAE decoder then reconstructs from their compressed representations. The root-mean-square error (RMSE) is computed between the input and output distance maps. The trained VAE shows a low mean RMSE across the upper triangle of each distance map (1.16 Å). c, Accurate bond reconstruction is essential for generating physically realistic conformations, and our model achieves a very low mean RMSE (0.18 Å). d, Extended schematic of the STARLING discrete-time denoising diffusion probabilistic model (DDPM). The model begins with 2D Gaussian noise (1 channel, 24×24 latent space) and is projected to 64 channels via a convolutional upsampling layer. Spatial features are then patchified into 3×3 regions (8 total) and passed through a 12-layer vision transformer (ViT). The ViT output is stitched back (depatchified) and reduced back to 1 channel by a convolutional downsampling layer. To generate sequence-specific distance maps, the ViT is conditioned on sequence features extracted by a transformer-based encoder. The input sequence is ordinally encoded, embedded into high-dimensional vectors, followed by the addition of sinusoidal positional embeddings, and processed by a 12-layer transformer. Ionic strength information is additionally incorporated through cross-attention, yielding environment-specific sequence features that are fed into the denoising module’s cross-attention layers. e, Model overview as set up during training. f, Model overview as used for inference.

In the second stage of training, we developed a DDPM (Fig. 1d and Extended Data Fig. 2d–f). In brief, this model was trained to learn parameters that map random noise to individual latent-space conformer distance maps, conditioned on the associated amino acid sequence and ionic strength (see Methods). The training data came from nearly 12 million distance maps across almost 78,000 simulations. Each map was first compressed into a latent space, and then the fixed forward diffusion process was used to add noise to each latent-space distance map. The underlying vision transformer model was then trained to learn parameters that reversed this forward diffusion process, conditioned on the input amino acid sequence and ionic strength (Fig. 1g). Ultimately, ensemble generation is enabled by running many inference rounds in parallel, creating many independent distance maps.

Although training requires the VAE encoder to generate latent-space distance maps, only the VAE decoder is needed for inference once the model is trained. Ultimately, the fully trained model (STARLING) combines two different models (the VAE decoder and the DDPM) that work together to enable rapid ensemble prediction. Using default settings, STARLING can generate 400 independent IDR conformations in approximately 12 s on a GPU (Nvidia A4000), 20 s on a Macbook Pro M3 CPU and approximately 6 min on an Intel CPU (Intel(R) Xeon(R) Silver 4210 R CPU @ 2.40 GHz; Extended Data Fig. 3a,b). The prediction runtime and memory are roughly independent of sequence length; 400 conformations for a 50-amino-acid IDR took approximately 10 s, whereas 400 conformations for a 350-residue IDR took approximately 14 s, and prediction times were not hampered by low-memory GPUs (Extended Data Fig. 3c–e).

Extended Data Fig. 3. Starling performance comparisons.

Extended Data Fig. 3

a,b, STARLING benchmarking on different types of hardware in linear (a) and log (b) space. Runtime depends linearly on the number of conformers (although sequence length has almost no impact on runtime). Tested here using a 140-residue sequence. On GPUs and Apple Silicon (facilitated by CUDA and MPS, respectively), STARLING is highly performant, offering ensembles sufficient for detailed biophysical investigation in 10-15 s (GPU) or 30-60 s (MPS). The specific hardware tested here includes the Nvidia A4000 (GPU), MacBook Pro M3 Max CPU (MPS), and Intel(R) Xeon(R) Silver 4210 R CPU @ 2.40 GHz (CPU). We note that even on a five-year bottom-of-the-line GPU (GTX-1660 from 2019), ensemble generation takes ~50 s for 400 conformers. c, Number of conformers generated per second on different hardware types. The specific hardware tested here includes the Nvidia A4000 (GPU), MacBook Pro M3 Max CPU (MPS), and Intel(R) Xeon(R) Silver 4210 R CPU @ 2.40 GHz (CPU). d, STARLING ensemble predictions exhibit almost no runtime or memory scaling with respect to sequence length. On a GPU, generating 400 confomers takes ~10 s for a 50-residue IDR and ~14 s for a 350-residue IDR. e, STARLING enables users to select a batch size for concurrent ensemble prediction. A larger batch size requires more GPU-accessible memory. However, we have implemented STARLING in a manner that increasing the batch size does not materially affect the overall per-conformer prediction time. This means even GPUs with modest memory resources can work with STARLING-based predictions by selecting a reduced batch size.

We first checked that the STARLING-derived ensemble-averaged global dimensions agree with those obtained from the Mpipi-GG simulation of unseen sequences. In all cases, STARLING ensembles comprised 800 conformers. Using a held-out test set of approximately 10,000 sequences, we performed Mpipi-GG simulations at 150 mM ionic strength and calculated Rg and Re. Our STARLING-derived Rg (RMSE = 0.85 Å, R2 = 0.996) and Re (RMSE = 3.48 Å, R2 = 0.989) values are in excellent agreement with the simulations (Fig. 2a,b).

Fig. 2. Evaluating STARLING on held-out simulation data.

Fig. 2

a, Comparison of STARLING-derived average Rg values with those from Mpipi-GG (simulations) from a set of approximately 10,000 unseen sequences. b, Comparison of STARLING-derived average Re values with those from Mpipi-GG (simulations) from approximately 10,000 unseen sequences. c, Comparison of STARLING-derived Rg values with those from Mpipi-GG (simulations) from a set of approximately 3,000 unseen sequences at an ionic strength of 20 mM. d, Comparison of STARLING-derived Rg values with those from Mpipi-GG (simulations) from a set of approximately 3,000 unseen sequences at an ionic strength of 300 mM. e, STARLING ensembles enable the calculation of distributions of observables. Distribution similarity can be quantified by the Hellinger distance (H), where H = 0 is perfect overlap and H = 1 is no overlap. f, Overlap of Rg distributions obtained from STARLING (blue bars) and Mpipi-GG (orange line), with the experimental value shown for completeness. Overlap between the STARLING and Mpipi-GG distributions was quantified by the H value and the KS statistic (D) to assess the difference between the distributions and the effect size of the difference. A KS statistic of 0.02 means that the CDFs of the two distributions differ by at most 2%. Vertical dashed lines indicate the average STARLING-derived value. g, 3D configurations (left) can be represented as distance maps (right), where each pixel defines the distance between a pair of residues (rij). h, Ensembles of conformations (left) can be represented as ensembles of distance maps (right). i, Intra-residue distance distributions in STARLING ensembles can be compared with equivalent distributions from Mpipi-GG simulations. j, All possible intra-residue distances can be compared between STARLING and Mpipi-GG simulations using H to quantify overlap in distributions (top). Histograms of intra-residue H values are also shown (bottom). k, Pairwise inter-residue distance comparison across different exemplar ensembles reveals excellent overlap.

As STARLING was also trained on Mpipi-GG simulations at both 20 mM and 300 mM ionic strength, we verified that STARLING-derived ensemble-averaged global dimensions match the simulations. Using a held-out test set of approximately 3,000 unseen sequences, we performed Mpipi-GG simulations at ionic strengths of 20 mM and 300 mM. Our STARLING-derived Rg (Fig. 2c,d; RMSE = 0.98 Å and 1.12 Å, R2 = 0.995 and 0.992 for 20 mM and 300 mM, respectively) and Re (Extended Data Fig. 4a,b; RMSE = 4.17 Å and 4.53 Å, R2 = 0.986 and 0.978 for 20 mM and 300 mM, respectively) are again in the excellent agreement with simulations.

Extended Data Fig. 4. Accuracy and convergence of STARLING compared to Mpipi-GG simulations.

Extended Data Fig. 4

a, Comparison of STARLING-derived average end-to-end distance (Re values) with those from Mpipi-GG (simulations) from a set of ~3,000 unseen sequences at an ionic strength of 20 mM. b, Comparison of STARLING-derived Re values with those from Mpipi-GG (simulations) from a set of ~3,000 unseen sequences at an ionic strength of 300 mM. cf, Convergence of the radius of gyration (Rg) as a function of ensemble size for IDRs of varying lengths (50, 150, 250, and 350 residues). Dashed lines represent Mpipi-GG convergence, solid lines correspond to STARLING convergence, and the shaded area indicates the ensemble-averaged Rg from long Mpipi-GG simulations. STARLING-derived Rg values converge within the first 800 conformers, exhibiting minimal error compared to Mpipi-GG ensembles. Based on this analysis, we selected 800 STARLING conformers for Rg prediction comparisons in all benchmarking used here. g, Convergence plot depicting the relationship between Hellinger distance (H), the number of conformations sampled from STARLING, and the number of denoising steps for various an IDR of length 350. The Hellinger distance quantifies the similarity between two distributions, where a value of zero indicates identical distributions and a value of one signifies no overlap. The Hellinger distance decreases as the number of conformations and denoising steps increase, with optimal STARLING performance observed at 1000 conformers and 30 denoising steps. h, Convergence of short-range and long-range distance distributions as a function of ensemble size. A 10 µs Mpipi-GG simulation of a 250-residue disordered protein was split into two 5 µs segments, with one serving as the reference and the other for comparison. A 2,500-conformer STARLING ensemble was also compared to the reference segment. Panels show convergence of STARLING and Mpipi-GG distance distributions relative to the reference for short-range residue pair 1–10, medium-range residue pair 1–50, long-range residue pair 1–100, and the end-to-end distance (residue pair 1–250). Error bars represent bootstrap-estimated standard deviations. Insets display the full distance distributions, along with Kolmogorov–Smirnov statistics for STARLING and Mpipi-GG (legend: blue, STARLING; orange, Mpipi-GG; black dashed line, reference). i, Convergence of radius-of-gyration from short to long sequence lengths. A 10 µs Mpipi-GG simulation of a disordered protein was split into two 5 µs segments, with one serving as the reference and the other for comparison. A 2,500-conformer STARLING ensemble was also compared to the reference segment. Panels show convergence of STARLING and Mpipi-GG radii-of-gyration distributions relative to the reference for a 50-residue-long disordered protein, a 150-residue-long disordered protein, a 250-residue-long disordered protein, and a 350-residue-long disordered protein. Error bars represent bootstrap-estimated standard deviations. Insets display the complete radii-of-gyration distributions, along with Kolmogorov–Smirnov statistics for STARLING and Mpipi-GG (legend: blue, STARLING; orange, Mpipi-GG; black dashed line, reference). j, Comparison of STARLING-derived average radii of gyration (Rg values) for length-matched sequences with a broad range of chemistries for those from Mpipi-GG (simulations) from a set of 137 unseen sequences. This is done to verify that STARLING has learned real sequence chemistry, not just that longer sequences are bigger. k, Comparison of STARLING-derived average radii of gyration (Re values) for length-matched sequences with a broad range of chemistries for those from Mpipi-GG (simulations) from a set of 137 unseen sequences. This is done to verify that STARLING has learned real sequence chemistry, not just that longer sequences are bigger.

Model accuracy depends on the balance between ensemble size and denoising steps. By comparing STARLING results to long-timescale Mpipi-GG simulations, we found that error in the Rg stabilizes beyond 30 denoizing steps and 400 conformations (Extended Data Fig. 4c–i). We adopted these as default parameters, although larger ensembles may be required for other order parameters.

As Re and Rg are highly correlated with sequence length, we next assessed the accuracy of our model on a set of length-matched sequences29. This approach allowed us to determine whether the model had effectively captured the influence of sequence chemistry on the global dimensions of IDRs, or whether it had simply learned to relate sequence length to these dimensions. Using all 100-residue, 200-residue and 300-residue sequences with distinct sequence chemistries from the held-out test set, we found excellent agreement across a range of Rg (Extended Data Fig. 4j; RMSE = 0.78 Å, 1.18 Å and 1.20 Å, R2 = 0.984, 0.990 and 0.995 for 100-residue, 200-residue and 300-residue lengths, respectively) and Re (Extended Data Fig. 4k; RMSE = 3.38 Å, 4.22 Å and 5.65 Å, R2 = 0.965, 0.984 and 0.986 for 100-residue, 200-residue, 300-residue lengths, respectively). These results gave us confidence that STARLING had learned bona fide sequence-to-ensemble rules instead of simply learning polymer scaling theory.

In addition to ensemble-averaged observables, full structural ensembles enabled us to calculate distributions of any observable of interest (Fig. 2e). We set out to assess how well STARLING-derived global dimension distributions match distributions from simulations. We used the Hellinger distance (H) as a similarity measure to compare distributions. H ranges from 0 to 1, where zero indicates that the distributions are identical, and one signifies that they are entirely disjoint (Fig. 2e). The STARLING-derived Rg distributions show excellent agreement with distributions from simulations (Fig. 2f). This overlap was quantified by a low H, confirming the high degree of similarity between the two distributions.

Finally, to quantify ensemble fidelity, we assessed all inter-residue distance distributions between ensembles generated using STARLING or Mpipi-GG simulations. Each 3D conformation can be represented as a single distance map (Fig. 2g), and the ensemble of conformations can be represented as an ensemble of distance maps (Fig. 2h). Therefore, every inter-residue distance distribution from a pair of residues in a STARLING ensemble can be compared against the corresponding distribution from an Mpipi-GG (Fig. 2i). For a 100-residue IDR, this yields 4,900 unique distribution comparisons. We could quantify the overlap between the distributions using H and visualize each pair using a heat map (Fig. 2j, top). Furthermore, we could use histograms for those H to provide an overall assessment of the ensemble agreement (Fig. 2j, bottom). Overall, we found excellent agreement across all distance distributions investigated (Fig. 2k). In summary, all evidence supports that STARLING can directly predict conformational ensembles of disordered proteins from sequence.

Having established that STARLING can recapitulate simulated ensembles, we next investigated agreement with experimental data (Fig. 3a). We used a previously curated set of 133 sequences for which high-quality small-angle X-ray scattering (SAXS) data have been assembled and computed average Rg values. We found excellent agreement between average Rg values from STARLING and SAXS-derived Rg, yielding values comparable with state-of-the-art coarse-grained simulations (RMSE = 4.53, R2 = 0.90; Fig. 3b, Extended Data Fig. 5a–h and Supplementary Tables 3 and 4).

Fig. 3. Experimental validation of STARLING.

Fig. 3

a, STARLING enables rapid generation of ensembles from sequence. b, STARLING shows state-of-the-art accuracy in terms of predicting the average Rg from sequence. IDR sequences (n = 133) comparing SAXS-derived and STARLING-derived average Rg values (see also Extended Data Fig. 4c–f) are shown. c, STARLING accurately captures the effect of small numbers of mutations, here illustrating how altering the number of aromatic residues changes the global dimensions of a low-complexity domain (LCD) taken from the RNA-binding protein hnRNPA1. Y/F, tyrosine/phenylalanine. d, Full scattering curves can be back-calculated from STARLING ensembles using FOXS, illustrating how changes in conformational behaviour alter both global dimensions (small angles and low q values) and the shape of the ensemble (intermediate and high q values). e, Comparison of 12 different SAXS profiles with scattering profiles calculated from STARLING ensembles. The insets show STARLING Rg distributions, with average Rg values from scattering data displayed as a solid black line. See also Extended Data Figs. 4j,k and 5a–h. f, STARLING-derived Re distance distributions with experimental values are shown as black lines for 16 different length-matched disordered protein sequences with different sequence chemistries. See also Extended Data Fig. 5i–k. g, SAXS scattering curves for unlabelled proteins for a subset of those in panel f.

Extended Data Fig. 5. Benchmarking STARLING vs. other methods, and assessment of DisProt ensembles.

Extended Data Fig. 5

Comparison of experimental Rg values obtained from 133 different experimental datasets with various state-of-the-art tools for ensemble prediction (a) STARLING (this work), (b) Mpipi-GG, coarse-grained molecular dynamics simulations, (c) CALVADOS2, coarse-grained molecular dynamics simulations, (d) ALBATROSS, a deep-learning-based predictor that can only predict average ensemble values, (e) The Analytical Random Flory Coil (AFRC), a limiting null model that considers IDPs as Gaussian-like chains with no interactions between residues. We include this because it represents the true “lower bound” for what a model should achieve in terms of accuracy and RMSE. f, idpGAN, a Generative Adversarial Network for predicting ensembles from the sequence. g, We also compared Rg predictions on GPU vs. MPS implementations for completeness. This revealed 1:1 agreement that matched the RMSE and correlation obtained when the same dataset was analysed twice on the same hardware (h). i, Analysis of sequences from DisProt – a database of experimentally verified disordered regions – based on STARLING ensembles (sequences <380 residues). All ensembles contain 400 conformers, and the full DisProt was ~15 h on a desktop computer with an NVIDIA A4000 GPU (i.e., 3-4 proteins per minute). Fitting ensemble dimensions vs. length reveals an apparent scaling exponent of 0.56, indicating these IDRs are relatively expanded. However, we note an unavoidable acquisition bias in experimentally characterized IDRs, meaning that highly soluble (and hence more expanded, with fewer intra- and intermolecular interactions) proteins are preferred for biophysical characterization because they do not aggregate/crash out of solution, and many techniques used to characterize IDPs (SAXS, NMR, etc.) require high protein concentrations. We also note that we underestimate the contributions of aliphatic hydrophobes in driving intramolecular interactions, as well as the effects of local helicity and the long-range consequences of intramolecular helix interactions. Finally, given that IDRs are heteropolymers, average scaling behavior, by definition, masks both attractive and repulsive intramolecular interactions. As such, we strongly caution against interpreting a value of 0.56 as indicating that all IDRs are highly expanded. j, Analysis of sequences in the most compact 10% of IDRs identified aromatic residues as being over-represented here, in line with prior work. k, An analogous analysis of sequences in the top 10% of expanded IDRs also reveals that negatively charged residues and proline are enriched in sequences that drive expansion, again in agreement with prior work.

To further validate STARLING, we predicted ensembles for 3,417 IDRs from DisProt30, successfully replicating established sequence-conformation trends (for example, aromatic residues drive compaction, whereas proline and glutamic acid promote expansion3,20,31,32; Extended Data Fig. 5i–k and Supplementary Table 5). STARLING-generated ensembles are also sensitivity to minor sequence changes; when tested on four hnRNPA1 low complexity domain (LCD) variants, the predicted dimensions of the model quantitatively matched experimental SAXS data regarding aromatic content3 (Fig. 3c).

To go beyond simple Rg comparisons (Fig. 3b,c), we leveraged STARLING-generated ensembles to back-calculate synthetic SAXS curves using FOXS33. This allows for a direct comparison between our model and raw experimental SAXS data (down to 5–10 Å resolution). We demonstrated that STARLING accurately captures the subtle conformational shifts in A1-LCD variants, as reflected in their specific scattering profiles3. Next, we selected 12 unrelated IDRs for which high-quality scattering data were available and compared scattering profiles derived from STARLING ensembles with experimental data, finding excellent agreement across a wide range of sequence chemistries and sequence lengths (Fig. 3e and Supplementary Table 3). We further reanalysed 40 additional sets of scattering data, finding that the majority are in excellent agreement with STARLING-derived scattering profiles (Extended Data Fig. 6a and Supplementary Figs. 1 and 2). Together, our data suggest that our STARLING ensembles faithfully reproduce experimentally measured IDR ensemble behaviour across a range of length scales and sequence chemistries.

Extended Data Fig. 6. Comparison of STARLING-derived ensembles with experimental data.

Extended Data Fig. 6

a, Comparison of STARLING-derived Rg values vs. SAXS for high-quality consistently reanalysed SAXS data. Comparison of SAXS-derived radii of gyration vs. STARLING-derived radii of gyration for the same 53 sequences as described in Supplementary Figs. 1 and 2. Sequences in which experimental data are represented as a red dot are those where scattering profiles were in sub-optimal agreement with STARLING-derived scattering data—overall, focusing on this high-quality dataset (each of which we re-analysed to obtain Rg values using the Molecular Form Factor (MFF) of ref. 63), we find even stronger agreement with STARLING-derived predictions (inset). b, Amino acid sequences for those IDRs investigated by smFRET in ref. 34. c, Correlation between STARLING-derived end-to-end distances and smFRET-derived end-to-end distances. The agreement between STARLING and the experiment is assessed in various ways, including root-mean-square error (RMSE), Spearman correlation coefficient (ρ), squared Pearson correlation coefficient (r2), and concordance correlation coefficient (ρc). We note that STARLING appears to underestimate end-to-end distances for more expanded chains here. However, we do not take dyes into explicit consideration, and it is possible that for the expanded negatively charged sequence, a small amount of (negatively-charged) dye:protein repulsions could account for the slight underestimation here. d, Rg distribution for cMyc1-361 obtained from STARLING ensembles. We predict an average Rg of 58 Å. We note this value may be an overestimate given cMyc1-361 is predicted to contain several regions of transient helicity, which may reduce the Rg, such that we suggest a value closer to 53-54 Å may be a more accurate reflection of the expected dimensions for monomeric cMyc1-361. e, The same figure is shown for Fig. 4d, with an additional polymer scaling line (black dashed) derived from fully denatured proteins measured by Kohn et al.64, also superimposed. Fitting the radius of gyration vs. number of amino acids to a polymer scaling law of the form Rg = B0Nν (where Rg is the radius of gyration, B0 is a prefactor term that captures a combination of chain stiffness and monomer excluded volume, and ν is the apparent Flory scaling exponent) revealed well-fit parameters of B0 = 3.02 Å and ν = 0.51. Black circles are STARLING average values, with grey box-plots showing the distribution of values from the ensemble. Yellow circles are results from all-atom Monte Carlo simulations. Blue squares are experimental (SAXS) data. f, Equivalent analysis to that shown in a, but with the end-to-end distance (Re) fit instead of the radius of gyration. Grey box-plots show the distribution of values from the STARLING ensemble. Yellow circles are results from all-atom Monte Carlo simulations. g, Equivalent analysis to that shown in panel A, but the hydrodynamic radius (Rh) is fit instead of the radius of gyration. Grey box-plots show the distribution of values from the STARLING ensemble. Yellow circles are results from all-atom Monte Carlo simulations. h,Rg values for CTD variants in which all proline residues have been converted to glycine (P2G). Grey box-plots show the distribution of values from the STARLING ensemble. i, Comparison of scaling behavior for wild-type (black), P2G (green), Y2S (yellow), and P2G, Y2S (red) variants of the CTD. The large prefactor (3.02) observed in the wild-type sequence arises from a combination of the bulky tyrosine residue and the stiff proline residue; the double Y2S, G2P variant exhibits a prefactor comparable to unfolded-state scaling behaviour (see e).

Finally, we sought to compare against an alternative experimental modality in single-molecule Förster resonance energy transfer (smFRET). Recent work has illustrated how a series of 16 length-matched sequences shows a range of end-to-end distances depending on the underlying sequence chemistry34. We compared predicted end-to-end distances (and, where available, SAXS-derived scattering data) for those 16 sequences, again finding good agreement between STARLING ensembles and experimental data (Fig. 3f, Extended Data Fig. 6b,c and Supplementary Table 3). Of note, our average error here across the 16 sequences (RMSE of 6.7 Å) is comparable with the differences obtained between different experimental groups in a recent smFRET benchmarking study35. Overall, our work suggests that STARLING can generate full coarse-grain ensembles of IDRs that accurately capture experimental data.

Having established that STARLING can reproduce extant experimental data, we applied it to investigate a set of distinct IDRs across various contexts. These vignettes illustrate the types of rapid exploration and hypothesis generation that STARLING enables.

The master transcriptional regulator Myc (439 residues) underlies cell growth, metabolism and proliferation, and is frequently dysregulated in many cancers36. Myc consists of a large N-terminal IDR (Myc1–361), a C-terminal HLH DNA-binding domain (Myc368–406) and a leucine zipper (Myc413–439). The large size of its IDR, coupled with its chemical composition, has challenged biophysical characterization (Fig. 4a). Using STARLING, we generated conformational ensembles of Myc1–361 to investigate sequence-encoded conformational biases in the ensemble (Fig. 4b and Extended Data Fig. 6d). In particular, we wondered whether conformational behaviour would offer insight into regions within Myc known to bind to other partners (so-called Myc boxes). Curiously, the STARLING-derived ensemble suggests that Myc1–361 can be divided into two halves (IDR1 and IDR2). IDR1 (Myc1–200) is more compact and engages in intramolecular interactions and some long-range interactions. IDR2 (Myc201–361) is more expanded with locally compact regions. Finally, conformationally distinct subregions align with established Myc-box boundaries, indicating local and long-range conformational properties can align with established functional annotations. Overall, these results predict that Myc1–361 contains strong sequence-encoded biases that may be correlated with (or even underlie) its function.

Fig. 4. Biophysical characterization and design applications with STARLING.

Fig. 4

a, The transcription factor Myc is largely disordered. b, STARLING-derived Myc1–361 ensembles identify distinct subregions that drive attractive and repulsive interactions. The darker colours are closer together, and the lighter colours are further apart. c, The RNA polymerase II CTD consists of a series of heptad repeats. d, STARLING ensembles show excellent agreement with SAXS and all-atom simulations (sims.), enabling polymer theory to describe CTD dimensions with a simple analytical expression. e, Comparison of predicted hydrodynamic radii (Rh) values based on STARLING-derived polymer scaling behaviour (Extended Data Fig. 6g) and Rh values from freely diffusing CTD-only constructs measured in nuclei. f, Histone H1.0 and ProTα form a high-affinity complex together. The H1.0 CTD and ProTα were tethered together by an inert (GS)80 linker. g, The resulting ensemble shows robust H1.0–ProTα interaction. h, The H1.0 CTD and ProTα contract in the complex versus in isolation. i, Comparison of per-residue ProTα–H1.0 CTD contacts and NMR-derived chemical shift perturbations (CSPs) shows excellent agreement. Blue highlighting identifies core binding subregion. j, Nupr1 and ProTα form a complex; we used a shorter (GS)5 linker to enable comparison to SAXS data. k, STARLING-derived ensembles show excellent agreement with Nupr1, ProTα and the Nupr1–ProTα complex. l, The TRPV4 channel has a large N-terminal IDR that contains a basic phosphatidylinositol 4,5-bisphosphate-binding site (PBS). NTD, N-terminal domain. m, STARLING-derived ensembles show good agreement with SAXS data. n, Difference distance map showing the difference in inter-residue distance between the mutant sequence minus the wild-type sequence. Orange colours indicate that distances are further apart in the mutant while blue means that residues are closer together in the mutant than in the wild type. The PBS engages in long-range interactions with a distal acidic region, and mutation of positively charged residues reduces these interactions, in good agreement with extant experimental work. o, STARLING enables systematic prediction of disordered microprotein ensembles. p, Using an optimized latent-space design approach, STARLING enables rapid sequence design to match a desired conformational ensemble. See also Extended Data Fig. 7d–m.

The C-terminal domain (CTD) of eukaryotic RNA polymerase II (RPB1) is a large IDR composed of tandem YSPTSPS heptad repeats, whose repeat number varies across species (52 in humans)37 (Fig. 4c). The CTD is believed to function as a multivalent interaction platform for transcriptional regulation. Extensive previous work has shown that low-complexity domains enriched in tyrosine (such as the CTD) are prone to intermolecular and intramolecular interactions, meaning that these domains would be expected to be relatively compact (driven by transient but numerous intramolecular contacts)3. Unexpectedly, previous in vitro work characterized an 81-residue fragment of the CTD with a seemingly highly extended radius of gyration, equivalent to that expected under strongly denaturing conditions (for example, 8 M urea)38 (Extended Data Fig. 6e). This raises an apparent paradox: how can the CTD be simultaneously enriched for tyrosine, poised for intermolecular and intramolecular interactions, yet also appear to behave like a non-interacting, fully denatured polypeptide chain?

To answer this question, we generated STARLING ensembles across 26 different CTD lengths, complemented by all-atom Monte Carlo simulations across five lengths. All-atom and STARLING-derived ensembles are in excellent agreement with one another, and with extant SAXS data for CTD variants of length 81 and 312 residues (Fig. 4d). This analysis (see Methods; Extended Data Fig. 6f,g and Supplementary Fig. 3) revealed that the CTD in fact behaves with polymer statistics of an effective Gaussian chain, meaning it is poised to engage in intramolecular and intermolecular interactions39. The seemingly extended nature of the ensemble is driven by the stiffness imparted by the size of the tyrosines and the high proline content (Extended Data Fig. 6h,i). Furthermore, comparisons of CTD dimensions calculated from in-cell single-molecule diffusion data by Ling et al. support a model in which the STARLING-derived CTD dimensions persist in live cells40 (Fig. 4e). These results reconcile previous observations and suggest that in cells, the CTD is primed for intermolecular interactions.

We next wondered whether we could tether two different IDRs together with a glycine-serine (GS) linker to investigate pseudo-intermolecular interactions (Fig. 4f). Tethering the C-terminal IDR from the histone H1.0 (H1.097–198) to the histone chaperone prothymosin-α (ProTα) enabled the formation of an electrostatically driven complex between the two proteins (Fig. 4g), as previously reported41. Both proteins contract in the complex (Fig. 4h), consistent with experimental data and theory. Moreover, the specific residues in ProTα identified by nuclear magnetic resonance (NMR)-derived chemical shift perturbations match those residues from the STARLING complex ensemble that drive intermolecular interactions (Fig. 4i). We also investigated the complex between ProTα and the fully disordered 82-residue protein Nupr1 (ref. 42) (Fig. 4j). Here again, we saw agreement with SAXS data for the individual proteins and, using a shorter GS linker, for the disordered bimolecular complex (Fig. 4k). These results indicate that STARLING-derived binary complex ensembles can capture elements of IDR–IDR interaction ensembles.

Although our analyses thus far have primarily focused on ensemble differences between very different sequences, we wondered whether STARLING could interpret relatively small amino acid changes. The N-terminal IDR from the transient receptor potential vanilloid 4 (TRPV4) ion channel has a long-range regulatory interaction between a basic phosphatidylinositol 4,5-bisphosphate-binding site and an acidic cluster43 (Fig. 4l). STARLING-derived ensembles of this region show good agreement with SAXS data (Fig. 4m). Moreover, comparing the differences in intramolecular distances between the wild type and a KRWRR-to-AAWAA mutant that reduces long-range interactions in vitro yields a commensurate reduction in long-range interactions in STARLING ensembles (Fig. 4n). These illustrate how STARLING can be used to understand biophysical mechanisms of functional regulation in proteins, even with relatively subtle changes in sequence.

We were also interested in investigating the application of STARLING to larger-scale ensemble prediction. Recent work has characterized a large set of microproteins — non-canonical open reading frames typically under 100 amino acids in size — using a combination of mass spectrometry and Ribo-seq-based approaches44. Although these microproteins are highly disordered, they possess distinct amino acid sequence biases from ‘canonical’ open reading frame-derived IDRs (Extended Data Fig. 7a–c and Supplementary Table 6). From 1,785 microproteins in these data, we identified 1,672 IDRs and predicted ensembles for all of them. Although most short IDRs are relatively expanded, many had extensive intramolecular contacts driven primarily between aromatic and arginine residues (Fig. 4o). These results suggest that even short-disordered proteins can possess sequence-encoded conformational biases.

Extended Data Fig. 7. Microprotein analysis and assessment of latent-space design approach.

Extended Data Fig. 7

a, Microproteins are predicted to be highly disordered. The comparison here is the fraction of amino acids in the canonical human proteome that are found in disordered regions vs. the fraction of amino acids in the Deutsch et al.44 set of microproteins (the ‘microproteome’) that are found in disordered regions. The disorder was predicted using metapredict V3. b, Statistical enrichment for each of the 20 amino acids, comparing frequency observed across all canonical proteins vs. frequency observed in microproteins. We excluded initiator methionine from both datasets, given its disproportionate influence on shorter compared to longer sequences. As observed previously for metazoan microproteins, we find enrichment for tryptophan and arginine and depletion of acidic residues. c, To establish whether the enrichment observed here reflects something specific to microproteins or the enhanced fraction of disorder, we also compared enrichment in microprotein-derived IDRs vs. canonical-proteome-derived IDRs. This revealed an even stronger signature: microprotein IDRs are highly enriched for tryptophan, cysteine, arginine, and phenylalanine. In contrast, they are highly depleted for lysine, asparagine, glutamic acid, and aspartic acid. d, Assessing designed variants of CTCF590-629 (a 40-residue IDR from CTCF, UniProt ID P49711) as a toy sequence for rational design. Hamming distance of 100 CTCF590-629 designs compared to CTCF590-629. A Hamming distance of 40 means every residue is different from the wild-type sequence. e, Average iteration time of the sequence optimizer based on sequence length using 20 replicates per length. f, Generation time based on sequence length using 20 replicates per length. Design targets are subregions of eIF4B (eIF4B158-611, Uniprot ID P23588), excised in 20-amino-acid increments, ranging from 40 to 380 amino acids. g, Ensemble accuracy versus sequence length using the same cosine similarity cutoff value of 0.97. Ensemble accuracy was assessed as the mean absolute error (MAE) between the generated and target sequences, expressed in Angstroms. The sequences were the same as those used for panel f. We note that while accuracy worsens as sequence length increases, this can be addressed by tightening the similarity cutoff. h–k, Examples of IDRs designed to match low-complexity sequences, in all cases done to optimize cosine similarity to a value of 0.97. Target IDRs are as follows: h, FUS1-285 (UniProtID P35637); i, hnRNPA184-372 (UniProtID P09651); j, EWSR11-359 (UniProtID Q01844); k, TAF151-236 (UniProtID Q92804). l, Per-sequence generation times for the four sequences. m, Per-sequence mean absolute error (MAE) comparing the distance map from the designed sequence vs. the target sequence, as measured in Angstroms.

Beyond accurate, rapid ensemble prediction, STARLING addresses the inverse-design problem: generating sequences whose conformational ensembles match a chosen target. To do this, we exploited the ensemble-aware sequence representations of the model by implementing a latent-alignment objective that maximizes cosine similarity between target and candidate learned sequence representations using the sequence optimization framework in GOOSE (see Methods)26. This allowed us to quickly design in this ensemble-aware latent space, rather than generating ensembles in 3D, enabling the rapid optimization of bespoke disordered sequences with prescribed ensemble properties.

As a proof of concept, we used STARLING in conjunction with our IDR design framework, GOOSE to design 100 sequences targeting a short IDR from the protein CTCF (CTCF590–629; UniProt ID: P49711)26. Starting from length-matched GS sequences, generation averaged 0.7 s per sequence with high structural accuracy (mean absolute error of 0.94 Å; average Hamming distance of 35; Extended Data Fig. 7d). Furthermore, we found that ensemble-matching designs can be generated within tens of seconds, even for long sequences (Extended Data Fig. 7e–m and Supplementary Fig. 4). These results, alongside performance tests on longer sequences, demonstrate that STARLING enables rapid, systematic design of IDRs with specific conformational properties.

Predictions up until this point have focused on ensemble generation at 150 mM ionic strength. However, the STARLING sequence encoder was explicitly conditioned on ionic strength (see Methods). Consequently, STARLING can also generate ensembles at ionic strengths between 20 mM and 300 mM (Fig. 5a).

Fig. 5. STARLING enables solution-dependent ensemble generation, Bayesian reweighting and ensemble-aware embedding for search.

Fig. 5

a, Schematic representation of sequence-dependent changes in conformational behaviour in response to changes in ionic strength for a blocky polyampholytic sequence. b, STARLING-derived ionic-strength-dependent Re distributions for a highly charged, ionic-strength-responsive sequence. Mpipi-GG simulations shown as a kernel density estimate plot. In each subpanel, we show H and the KS statistic to quantify the difference between distributions and the effect size of the difference. A KS statistic of 0.03 means the CDFs of the two distributions differ by at most 3%. c, Re correlation between the STARLING and Mpipi-GG simulations performed at various ionic strengths for the highly charged Das–Pappu sequences. Despite being trained only at three different ionic strengths, STARLING can interpolate across ionic strengths unseen during training. d, AFRC-normalized average inter-residue distance maps for STARLING (upper) compared with Mpipi-GG simulations (lower) across five different ionic strengths. e, Schematic representation of incorporating experimental data into STARLING for integrative biophysical modelling. f, Reweighting based on single-molecule Förster resonance energy transfer (smFRET) data for Re improves agreement with SAXS data (Rg), highlighting how ensemble reweighting through one modality can improve agreement with others. Exp., experiment. g, Schematic representation of the ensemble-aware latent embedding of STARLING as used for sequence search. h, Heatmap of the ratio (candidate:query) of average inter-residue distance maps for the fully disordered query protein SERF Saccharomyces cerevisiae (UniProt ID: Q3E7B7). The upper triangle shows the top-ranked candidate; the lower triangle shows the lowest-ranked candidate, where candidates were ranked by cosine similarity of their embedding maps to the query SERF after scanning a database of IDRs from UniRef50. The colours indicate deviation from SERF: white ≈ 1.0 (similar), purple < 1.0 (more compact) and orange > 1.0 (more expanded). Serendipitously, the most and least similar candidates are equidistant from SERF in terms of sequence changes.

To assess how well STARLING captures ionic-strength dependence — as defined by the underlying Mpipi-GG model — we analysed 30 strong polyampholytic sequences consisting solely of lysine and glutamate with distinct charge patterning (‘Das–Pappu sequences’)45. Figure 5b shows the end-to-end distance distributions from STARLING and Mpipi-GG at the three ionic strengths used during training, showing excellent agreement as measured by both the H and the KS statistic (note that these are not fitted to one another). This trend holds across the Das–Pappu sequence variants (Extended Data Fig. 8a,b).

Extended Data Fig. 8. Assessment of STARLING ionic strength effects.

Extended Data Fig. 8

a, Overlap between Mpipi-GG and STARLING Rg distributions for a subset of the Das-Pappu sequences. Rg histograms for four of the Das-Pappu sequences with distinct sequence patterning as a function of ionic strength. Blue bars represent STARLING results; the orange line represents Mpipi-GG results. Histogram overlap is quantified by the Hellinger distance (H) or the KS statistic (D). b, Overlap between Mpipi-GG and STARLING Re distributions for a subset of the Das-Pappu sequences. Re histograms for four of the Das-Pappu sequences with distinct sequence patterning as a function of ionic strength. Bars represent STARLING results; the orange line represents Mpipi-GG results. Histogram overlap is quantified by the Hellinger distance (H) or the KS statistic (D). c, Normalized distance maps for Das-Pappu sequences. Normalized distance maps for four of the Das-Pappu sequences with distinct sequence patterning as a function of ionic strength. The top-right triangle is STARLING, and the bottom-left triangle is Mpipi-GG. d, Normalized distance maps for ProTα-(GS)80-H1CTD constructs as a function of ionic strength. Distances here are normalized by a length-matched GS linker sequence (e.g., (GS)186) to provide a self-consistent “null model” through which we expected the central (GS)80 linker to reach a normalized distance value of 1, implying the linker is behaving as if it were not tethered. While not a prerequisite for these types of pseudo-intermolecular interaction ensembles, this provides us with confidence that we are observing behavior asymptotically approaching the limit in which the central linker has no meaningful impact on the resulting biomolecular complex. e, Pseudo-inter-molecular contacts between ProTα and H1CTD quantified as a function of ionic strength. Prior work (Borgia et al. 2018) has established that the ProTα:H1 interaction is highly salt dependent, consistent with our conclusions here. f, Quantification of intramolecular distances in the complex divided by the same distances for the IDR in isolation. This analysis reveals specific subregions that are more locally compact than others in the complex, and that H1CTD undergoes a greater degree of compaction than ProTα.

We next asked whether our ionic-strength model, conditioned with three anchor points (ionic strength of 20 mM, 150 mM and 300 mM), would generalize to new ionic strengths not seen in training. We evaluated 16 ionic strengths spanning 20–300 mM (13 never seen in training) and analysed the mean end-to-end distance of each ensemble. STARLING interpolates smoothly between the three training ionic strengths and reproduces Mpipi-GG means with high concordance (Pearson r = 0.98; Spearman ρ = 0.97; Fig. 5c). Beyond just global dimensions, normalized distance maps compared between ensembles generated via STARLING and Mpipi-GG simulations revealed that STARLING accurately learns the ionic-strength-dependent changes in the underlying conformational distributions of inter-residue distances (Fig. 5d and Extended Data Fig. 8c). Furthermore, STARLING can qualitatively reproduce ionic-strength dependence of the ProTα–H1.0CTD interaction41 (Extended Data Fig. 8d–f). Together, these results suggest STARLING can be used to investigate solution-dependent IDR interactions using a limited number of anchor points and interpolating between them.

Although STARLING aligns with Mpipi-GG and experimental data, it is not a substitute for rigorous biophysical measurements. Instead, experiments can complement STARLING by providing restraints to refine and integrate into IDR ensembles via two routes. The first route uses guidance during ensemble generation to push conformers towards target observables (see Methods; Extended Data Fig. 9a–c and Supplementary Fig. 5), an approach well-suited for generating ensembles far from the unbiased distribution (for example, cyclic peptides), although guided ensembles likely deviate from the canonical ensemble. The second route uses Bayesian maximum entropy (BME) reweighting — reimplemented natively within STARLING — to reweight ensembles to match experimental observables15. BME treats the STARLING ensemble as the prior and incorporates data (for example, SAXS and smFRET) through a likelihood-based restraint with uncertainties, yielding ensembles that better match experiment (Fig. 5e). As a proof of concept, we used BME to reweight four of the ensembles for sequences measured by Holla et al.34, treating the experimental end-to-end distance as a restraint in the BME objective. The resulting ensembles not only improve agreement with end-to-end distance but also agreement with the SAXS-derived Rg (Fig. 5f). These results suggest that STARLING provides a powerful and easy-to-use platform for integrative modelling of experimental data of IDRs.

Extended Data Fig. 9. Ensemble guidance and comparison with IDPFold and IDPForge.

Extended Data Fig. 9

A, Rg biased ensemble generation. Ensembles of an 84-residue RNA POLII CTD construct biased towards the specified Rg values with different harmonic constraints. Rg values as a function of target (Rg) and guidance strength. The x-axis reports on value for harmonic constraint, while the y-axis reports the target Rg value. Elements in the matrix report on the ensemble-average Rg value under the combination of target and constraint. B, Equivalent analysis for Re values as a function of target (Rg) and guidance strength. Equivalent biasing using end-to-end distance as the target is shown in Supplementary Fig. 5. C, 2D histograms of Rg (X-axis) and Re (Y-axis) for ensembles generated under the specified guidance. For each sub-plot, the x-axis reports on the ensemble Rg while the y-axis reports on the ensemble Re. For the matrix of subplots, the x-axis reports on the harmonic constraint applied and the y-axis reports on the target Rg value. An equivalent matrix of plots for biasing using end-to-end distance is shown in Supplementary Fig. 5D, Comparison of idpSAM-derived ensembles vs. STARLING compared to experimental data. Focussing on sequences less than 60 amino acids in length (except Ash1, discussed below), we compared Rg values obtained from idpSAM ensembles vs. Rg values from STARLING ensembles. For idpSAM, 1000 conformers were used (idpSAM defaults), whereas for STARLING, 400 conformers were used (STARLING defaults). We note that the Analytical Flory Random Coil (AFRC) – a null model in which IDR sequence chemistry does not influence the overall dimensions and instead IDR dimensions depend solely on chain length and (weakly) the intrinsic dimensions of each amino acid – obtains an r2 of 0.9 and an RMSE of 3.58 Å. We included Ash1 (83 amino acids) because, although it is larger than the ~60 amino acid cutoff used for training data in idpSAM, prior work has shown that SAXS data for Ash1 are well reproduced by CAMPARI/ABSINTH simulations. However, despite being trained on CAMPARI/ABSINTH simulations, idpSAM was unable to recapitulate this behaviour. E, Comparison of IDPFold-derived ensembles vs. STARLING compared to single-molecule FRET data reported by Holla et al.34. Correlation between experimentally-derived end-to-end distance (x-axis) and STARLING (black) or IDPFold (Blue) derived end-to-end distances. F, Comparison of SAXS scattering profile for the sNrich ensemble derived from IDPFold ensembles (red) vs. experiment (black). Inferred radii of gyration from IDPFold, STARLING, and SAXS are provided. G, Comparison of SAXS scattering profile for the dCh minus ensemble derived from IDPFold ensembles (red) vs. experiment (black). Inferred radii of gyration from IDPFold, STARLING, and SAXS are provided. H, Comparison of SAXS scattering profile for the dTRBP ensemble derived from IDPFold ensembles (red) vs. experiment (black). Inferred radii of gyration from IDPFold, STARLING, and SAXS are provided. I, Comparison of SAXS scattering profile for the sNh plus ensemble derived from IDPFold ensembles (red) vs. experiment (black). Inferred radii of gyration from IDPFold, STARLING, and SAXS are provided. J-Q, Comparison of IDPFold and IDPForge vs. other approaches compared to radii of gyration derived from experimental SAXS data. Agreement between SAXS experimental data and: J, STARLING; K, IDPFold; L, AFRC - a null model where only sequence length contributes to chain dimensions; and M, CALVADOS2 for the set of sequences that were predictable using IDPFold. IDPFold performs relatively poorly at reproducing experimental SAXS data. The sequences used here are the 104 for which IDPFold ensembles could be generated. Agreement between SAXS experimental data and: N, STARLING; O, IDPForge; P, AFRC; and Q, CALVADOS2 for the set of sequences that were predictable using IDPForge. IDPForge again does a relatively poor job of reproducing experimental SAXS data. Sequences used here are the 110 sequences for which IDPForge ensembles could be generated. R, Comparison of time per conformation for STARLING, IDFold, and IDPForge as a function of sequence length. The variability in IDPForge times comes from a stochastic relaxation step.

Finally, beyond the strict generation of ensembles, a novel application enabled by the unique architecture of STARLING is the ability to perform fast, large-scale disordered-protein biophysical ensemble-similarity search directly from sequence (Fig. 5g). By encoding sequences into a conformationally aware latent space using the sequence encoder (Extended Data Fig. 2d), search can be performed within this embedded space, enabling the identification of conformationally similar IDRs.

As a proof of principle, we created a database of all predicted IDRs in UniRef50 with fewer than 380 residues (approximately 35 million sequences). We then created STARLING ensemble-aware embeddings for each of these sequences, implementing a methodology for rapid, large-scale approximate nearest-neighbour search (see Methods). We tested this approach by querying the protein SERF (UniProt ID: O75920) against this database46 (Fig. 5g). Next, we extracted the candidate with the most similar and least similar vector embeddings and built ensemble-averaged distance maps for each construct. Normalizing each of these candidates by the query protein SERF reveals that the most similar length-matched candidate has a highly similar averaged pairwise distance map (Fig. 5h). By contrast, the least similar length-matched candidate has stark differences with subregions that are both overly compact and overly expanded. Together, the STARLING sequence encoder enables IDR sequences to be represented in a high-dimensional conformationally aware space, with a broad range of potential applications, from design (Fig. 4p) to search (Fig. 5g,h) to the development of new protein representations that combine IDR-specific embeddings with general protein language embeddings.

Overall, these vignettes demonstrate the versatile range of applications that STARLING enables. Of note, the speed and ease with which STARLING generates ensembles mean that these investigations can be accomplished in hours or even minutes.

Discussion and conclusion

STARLING is a generative model that can produce coarse-grained IDR ensembles directly from protein sequences in seconds. Despite several caveats and limitations, one of the major features of STARLING is that it drastically reduces the barrier to obtaining conformational predictions of sequence-dependent IDR behaviour. Although IDRs are often ignored or visualized as AlphaFold ‘orange spaghetti’, STARLING makes it straightforwards for anyone to obtain realistic coarse-grained ensembles. Rather than ‘the end’ of an investigation into IDR sequence-ensemble behaviour, we view STARLING as a beginning, making it straightforwards to develop hypotheses as to how the sequence of an IDR may determine its conformational ensemble and/or how it may influence interactions with other IDRs.

A key feature of STARLING is its ability to generate conformational ensembles rather than just predict ensemble-average values. Although ensemble average properties (for example, average Rg or average Re) can be informative20,21,47, they do not enable a ‘structural’ investigation into the determinants of a given behaviour. With a conformational ensemble, the specific sets of residues responsible for an observed behaviour can be directly identified. Moreover, substantial effort has been placed into developing methods to back-calculate experimental observables from simulation trajectories to interpret experimental data15. STARLING ensembles are amenable to these analyses, allowing comparison with a range of experimental results. However, we note that all comparisons between STARLING predictions and experimental data rely on computational tools to predict observables from protein structures which can introduce their own approximations. As such, agreement with experiment reflects both STARLING and the accuracy of these tools and their assumptions.

Although STARLING generates ensembles in seconds, it has key limitations (see also Methods). Because STARLING was trained with simulations from a one-bead-per-residue coarse-grained force fields that do not include secondary structure, it cannot predict secondary structure elements. In addition, it cannot incorporate flanking folded domains. However, we view the inability to incorporate folded domains as a potential ‘feature’; discrepancies between predictions of isolated IDRs and experimental data (where flanking folded domains are present) may reveal biophysically meaningful interactions.

STARLING is also poorly poised to capture many environmental effects or post-translational modifications. IDRs can act as environmental sensors, with their conformational behaviour and interactions being tuned by their local chemical environment (salt, temperature and osmolytes, and so on)48. Furthermore, charge regulation can ‘rewrite’ the charge state of a protein, up-shifting or down-shifting intrinsic pKa values49. Because STARLING was trained using simulations performed under ‘standard conditions’ (infinite dilution, 298 K), it does not enable explicit incorporation of many of these physicochemical modulators. Moreover, post-translational modifications and non-canonical amino acids are currently not accounted for. Looking ahead, generalization to environmental sensitivity or non-canonical residues remains challenging, in part due to limited benchmarking data. Although our ionic strength conditional results are encouraging (Supplementary Fig. 6) — at least conceptually, see Methods for Debye–Hückel limitations — broader environmental response modelling will require expanded training sets and dedicated experimental benchmarks to establish reliability (see Methods).

Unlike conventional molecular dynamics, STARLING generates conformational ensembles without time-resolved trajectories. Although molecular dynamics provides explicit dynamics, it often struggles to accrue independent samples for long chains that reconfigure slowly, typically requiring advanced kinetic frameworks for robust rate estimates. By contrast, STARLING produces decorrelated samples through independent stochastic denoising, maintaining high speed regardless of sequence length. Consequently, STARLING avoids many of the challenges in generating ensembles for longer sequences seen in molecular dynamics. Furthermore, we suggest that STARLING-generated conformers could be used to seed large numbers of shorter molecular dynamics simulations, enabling more reliable kinetic analyses.

Several recent sequence-to-ensemble approaches have emerged, beginning with Taneja and Lasker’s two-stage DDPM, which established the feasibility of learning these mappings for heterogeneous sequences50. More recently, idpSAM utilized a latent-space DDPM trained on all-atom simulations, although its predictive power is generally limited to sequences under 60 residues51 (Extended Data Fig. 9d). Other methods, such as IDPFold and IDPForge, integrate protein language models with experimental or molecular dynamics data52,53 (Extended Data Fig. 9e–r and Supplementary Fig. 7). Unlike the coarse-grained approach of STARLING, these methods (and others) provide all-atom molecular detail and, in some cases, the ability to include folded domains, albeit at the expense of runtime performance, hardware requirements, and potentially accuracy with respect to global dimensions and long-range IDR-mediated interactions5256. Other emerging strategies include combining deep learning with Bayesian reweighting10,54,57 or predicting Hamiltonians rather than ensembles58. Overall, however, these tools often fall short of the balance of accuracy, performance and accessibility of STARLING. Furthermore, the ability of STARLING to reweight against experimental data (Extended Data Fig. 10) and encode and search ensembles offers unique research opportunities when integrated with bioinformatics pipelines or protein language models.

Extended Data Fig. 10. 3D reconstruction from distance maps, BME reweighting, and reanalyzed SAXS data.

Extended Data Fig. 10

a, Comparison of runtimes for native sklearn Multidimensional Scaling (MDS) algorithm vs. our Torch-based implementation of the Scaling by MAjorizing a COmplicated Function (SMACOF). We note that internally, the sklearn MDS algorithm uses a SMACOF implementation by default. b, Time per conformer for sklearn MDS vs. torch-based SMACOF. c, Comparison of error distributions (3D conformer vs. reconstructed conformer) for sklearn MDS vs. Torch SMACOF. Despite being substantially faster, our torch-based SMACOF implementation shows no difference in accuracy compared to the sklearn MDS implementation. d, Example distance maps showing deviations from reconstruction in a 384-residue sequence. e, Comparison of literature-reported Rg values (y-axis) obtained using Guiner analysis in the original publications, vs. our re-analysis of buffer-subtracted scattering data (x-axis) done using the Molecular Form Factor (MFF) approach of ref. 63, confirming the high quality of the data (very similar Rg values are arrived at regardless of the approach used to determine the Rg values). f, Theta (θ) scan for BME (Bayesian/Maximum Entropy) scan. This figure is generated automatically by STARLING’s BME implementation. X-axis reports on the effective fraction of conformers used, while the Y-axis reports on the χ2 value between experiment and ensemble (lower value means better agreement between experiment and ensemble). g, Visual representation of (ordered) conformer weights. This figure is also generated automatically by STARLING’s BME implementation.

STARLING enables systematic, library-scale design of IDRs to test the ‘ensemble–function relationship’: the hypothesis that IDR function depends directly on conformational behaviour1,2,26. Historically, designing sequences with targeted ensembles was challenging due to the astronomical sequence space and the low-throughput nature of experimental or computational ensemble characterization. Although recent exciting ensemble-based design methods have emerged, they are not practical for thousands of sequences59,60. By providing a direct optimization handle within a biophysically meaningful latent space, STARLING (integrated with GOOSE) transforms this underconstrained search into a tractable design problem, allowing for the creation of large-scale libraries to test sequence–ensemble–function hypotheses26.

Finally, the ensemble-aware embedding of STARLING enables sequence search and design based on conformational behaviour rather than primary structure. This provides a biophysically grounded alternative to BLAST, which often fails with poorly conserved or low-complexity IDRs. By facilitating the homology-independent discovery of ‘biophysical look-alikes’, this approach enables hypothesis transfer to uncharacterized regions and identifies natural templates for inverse design, even when traditional alignments and composition heuristics fail.

In conclusion, we hope STARLING serves the disordered protein community as a highly accessible tool for biophysical investigation. Unlike many deep learning tools, STARLING is designed for ease of use; it can be installed via pip and run with a single command. By enabling ‘on-the-fly’ predictions on commodity hardware, STARLING fills a unique gap in IDR characterization. Although it does not replace existing technologies, it provides a practical way to explore conformational behaviour easily, quickly, accurately and at scale.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-026-10141-2.

Supplementary information

Supplementary Information (4.8MB, pdf)

Supplementary Methods, which describe STARLING’s design, training, and implementation, and Supplementary Figs 1–7.

Reporting Summary (1.1MB, pdf)
Supplementary Tables 1–6 (430.9KB, xlsx)

Supplementary Table 1 contains a breakdown of sequences used for training, test, and validation. Supplementary Table 2 includes numbers of sequences used across different ionic strengths. Supplementary Table 3 contains amino acid sequences for proteins associated with Figure 3 and Extended Data 6a (203 sequences in total). Supplementary Table 4 contains amino acid sequences for proteins associated with Figs 4 and 5, Extended Data Figs 6h, 6i, 7d, 7e, 7f, 7g, 7i, 7j, 7k, 8a, 8b, 8c, 9j, 9k, 9l, 9m, 9n, 9o, 9p, 9q, 9r, Supplementary Fig. 5, and Supplementary Fig. 7 (512 sequences in total). Supplementary Table 5 contains DisProt sequences used in Extended Data 5i, 5j, and 5k. Supplementary Table 6 contains ncORF sequences used in Fig. 4o.

Peer Review File (1,019.6KB, pdf)

Acknowledgements

We thank J. Pelham, R. Somjee, B. Portz, S. Hoffman-Weitsman and F. Beinart for their comments on the manuscript; U. Hellmich for giving an exceptional talk that inspired our investigation of the TRPV4 IDR; our many colleagues for sharing and depositing raw experimental data (notably, SAXS data on SASDB), which has greatly facilitated comparison with experimental data; J. Sohn, S. Showalter, E. Martin, B. Schuler, M. Grazia Ortore and M. Blackledge for sharing raw SAXS scattering data; B. Kragelund and F. Stryhn Buus for sharing chemical shift data for the ProTα–H1.0 complex; the Research Infrastructure Services group at Washington University in St. Louis for providing computational resources and services required for this work; D. Stanzione for his relentless support of US high-performance scientific computing; L. Koesterke for support on Texas Advanced Computing Center (TACC) VISTA; and M. Turke and C. McCornack for the bird pictures. We are grateful to the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support program and resources, which is supported by US National Science Foundation (NSF) grants #2138259, #2138286, #2138307, #2137603 and #2138296; DeltaAI, which is supported by the US NSF (award OAC 2320345) and the State of Illinois; and the Delta advanced computing and data resource, which is supported by the NSF (award OAC 2005572) and the State of Illinois. We are indebted to the TACC administrators and staff. This work was funded by the US NSF through grant 2338129 (CAREER) to ASH and the US National Institutes of Health through CA290639 (DP2) to A.S.H. J.M.L. was supported by the NSF via grant number DGE-2139839 and by the Frontera Computational Sciences Fellowship.

Extended data figures and tables

Author contributions

B.N., J.M.L. and A.S.H. conceptualized the study. B.N. and J.M.L. designed the methodology. B.N., J.M.L., R.J.E. and A.S.H. provided software and performed validation, formal analysis and the investigation. J.M.L. and A.S.H. provided resources and acquired funding. B.N., J.M.L. and A.S.H. curated the data. B.N., J.M.L., R.J.E. and A.S.H. performed visualization and wrote the manuscript.

Peer review

Peer review information

Nature thanks Gregory Bowman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Data availability

The code and data associated with the figures in this article are available on GitHub (https://github.com/holehouse-lab/supportingdata/tree/master/2026/starling_2026). The Zenodo repository for UniRef50 IDR sequence embeddings is available62 (10.5281/zenodo.17342150). Any data missing are available from the corresponding author.

Code availability

STARLING is fully open source and available on GitHub (https://github.com/idptools/starling/). The documentation for STARLING is available (https://idptools-starling.readthedocs.io). STARLING is also provided as a Colab notebook (https://github.com/idptools/idpcolab/blob/main/STARLING/STARLING_demo.ipynb).

Competing interests

A.S.H. is on the scientific advisory board for Prose Foods. All other authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Borna Novak, Jeffrey M. Lotthammer

Extended data

is available for this paper at 10.1038/s41586-026-10141-2.

Supplementary information

The online version contains supplementary material available at 10.1038/s41586-026-10141-2.

References

  • 1.Holehouse, A. S. & Kragelund, B. B. The molecular basis for cellular function of intrinsically disordered protein regions. Nat. Rev. Mol. Cell Biol.25, 187–211 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.González-Foutel, N. S. et al. Conformational buffering underlies functional selection in intrinsically disordered protein regions. Nat. Struct. Mol. Biol29, 781–790 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Martin, E. W. et al. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science367, 694–699 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Flores, E., Acharya, N., Castañeda, C. A. & Sukenik, S. Single-point mutations in disordered proteins: linking sequence, ensemble, and function. Curr. Opin. Struct. Biol.91, 102987 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ghafouri, H., Tosatto, S. C. E. & Monzon, A. M. Advances in the determination of disordered protein ensemble. Curr. Opin. Struct. Biol.96, 103198 (2026). [DOI] [PubMed] [Google Scholar]
  • 6.Das, R. K., Ruff, K. M. & Pappu, R. V. Relating sequence encoded information to form and function of intrinsically disordered proteins. Curr. Opin. Struct. Biol.32, 102–112 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cubuk, J., Stuchell-Brereton, M. D. & Soranno, A. The biophysics of disordered proteins from the point of view of single-molecule fluorescence spectroscopy. Essays Biochem.66, 875–890 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Camacho-Zarco, A. R. et al. NMR provides unique insight into the functional dynamics and interactions of intrinsically disordered proteins. Chem. Rev.122, 9331–9356 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Martin, E. W., Hopkins, J. B. & Mittag, T. Small-angle X-ray scattering experiments of monodisperse intrinsically disordered protein samples close to the solubility limit. Methods Enzymol.646, 185–222 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Borthakur, K., Sisk, T. R., Panei, F. P., Bonomi, M. & Robustelli, P. Determining accurate conformational ensembles of intrinsically disordered proteins at atomic resolution. Nat. Commun.16, 9036 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ozenne, V. et al. Flexible-meccano: a tool for the generation of explicit ensemble descriptions of intrinsically disordered proteins and their associated experimental observables. Bioinformatics28, 1463–1470 (2012). [DOI] [PubMed] [Google Scholar]
  • 12.Bernadó, P., Mylonas, E., Petoukhov, M. V., Blackledge, M. & Svergun, D. I. Structural characterization of flexible proteins using small-angle X-ray scattering. J. Am. Chem. Soc.129, 5656–5664 (2007). [DOI] [PubMed] [Google Scholar]
  • 13.Fuertes, G. et al. Decoupling of size and shape fluctuations in heteropolymeric sequences reconciles discrepancies in SAXS vs. FRET measurements. Proc. Natl Acad. Sci. USA114, E6342–E6351 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bonomi, M., Heller, G. T., Camilloni, C. & Vendruscolo, M. Principles of protein structural ensemble determination. Curr. Opin. Struct. Biol.42, 106–116 (2017). [DOI] [PubMed] [Google Scholar]
  • 15.Bottaro, S., Bengtsen, T. & Lindorff-Larsen, K. Integrating molecular simulation and experimental data: a Bayesian/maximum entropy reweighting approach. Methods Mol. Biol.2112, 219–240 (2020). [DOI] [PubMed] [Google Scholar]
  • 16.Shea, J.-E., Best, R. B. & Mittal, J. Physics-based computational and theoretical approaches to intrinsically disordered proteins. Curr. Opin. Struct. Biol.67, 219–225 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Joseph, J. A. et al. Physics-driven coarse-grained model for biomolecular phase separation with near-quantitative accuracy. Nat. Comput. Sci.1, 732–743 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tesei, G., Schulze, T. K., Crehuet, R. & Lindorff-Larsen, K. Accurate model of liquid-liquid phase behavior of intrinsically disordered proteins from optimization of single-chain properties. Proc. Natl Acad. Sci. USA118, e2111696118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Dignon, G. L., Zheng, W., Kim, Y. C., Best, R. B. & Mittal, J. Sequence determinants of protein phase behavior from a coarse-grained model. PLoS Comput. Biol.14, e1005941 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lotthammer, J. M., Ginell, G. M., Griffith, D., Emenecker, R. J. & Holehouse, A. S. Direct prediction of intrinsically disordered protein conformational properties from sequence. Nat. Methods21, 465–476 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Tesei, G. et al. Conformational ensembles of the human intrinsically disordered proteome. Nature626, 897–904 (2024). [DOI] [PubMed] [Google Scholar]
  • 22.Wodak, S. J., Vajda, S., Lensink, M. F., Kozakov, D. & Bates, P. A. Critical assessment of methods for predicting the 3D structure of proteins and protein complexes. Annu. Rev. Biophys.52, 183–206 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ruff, K. M. & Pappu, R. V. AlphaFold and implications for intrinsically disordered proteins. J. Mol. Biol.433, 167208 (2021). [DOI] [PubMed] [Google Scholar]
  • 24.Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Preprint at http://arxiv.org/abs/2006.11239 (2020).
  • 25.Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. Preprint at https://arxiv.org/abs/2112.10752 (2021).
  • 26.Emenecker, R. J., Guadalupe, K., Shamoon, N. M., Sukenik, S. & Holehouse, A. S. Sequence-ensemble-function relationships for disordered proteins in live cells. Preprint at bioRxiv10.1101/2023.10.29.564547 (2023).
  • 27.Kingma, D. P. & Welling, M. Auto-encoding variational bayes. Preprint at http://arxiv.org/abs/1312.6114v11 (2013).
  • 28.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Preprint at http://arxiv.org/abs/1512.03385 (2015).
  • 29.Alston, J. J., Ginell, G. M., Soranno, A. & Holehouse, A. S. The analytical flory random coil is a simple-to-use reference model for unfolded and disordered proteins. J. Phys. Chem. B127, 4746–4760 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Nugnes, M. V. et al. DisProt in 2026: enhancing intrinsically disordered proteins accessibility, deposition, and annotation. Nucleic Acids Res.10.1093/nar/gkaf1175 (2025). [DOI] [PMC free article] [PubMed]
  • 31.Marsh, J. A. & Forman-Kay, J. D. Sequence determinants of compaction in intrinsically disordered proteins. Biophys. J.98, 2383–2390 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Müller-Späth, S. et al. From the cover: charge interactions can dominate the dimensions of intrinsically disordered proteins. Proc. Natl Acad. Sci. USA107, 14609–14614 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Schneidman-Duhovny, D., Hammel, M., Tainer, J. A. & Sali, A. Accurate SAXS profile computation and its assessment by contrast variation experiments. Biophys. J.105, 962–974 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Holla, A. et al. Identifying sequence effects on chain dimensions of disordered proteins by integrating experiments and simulations. JACS Au4, 4729–4743 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hellenkamp, B. et al. Precision and accuracy of single-molecule FRET measurements — a multi-laboratory benchmark study. Nat. Methods15, 669–676 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dhanasekaran, R. et al. The MYC oncogene — the grand orchestrator of cancer growth and immune evasion. Nat. Rev. Clin. Oncol.19, 23–36 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Eick, D. & Geyer, M. The RNA polymerase II carboxy-terminal domain (CTD) code. Chem. Rev.113, 8456–8490 (2013). [DOI] [PubMed] [Google Scholar]
  • 38.Gibbs, E. B. et al. Phosphorylation induces sequence-specific conformational switches in the RNA polymerase II C-terminal domain. Nat. Commun.8, 15233 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mao, A. H., Lyle, N. & Pappu, R. V. Describing sequence–ensemble relationships for intrinsically disordered proteins. Biochem. J.449, 307–318 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Ling, Y. H. et al. Disordered C-terminal domain drives spatiotemporal confinement of RNAPII to enhance search for chromatin targets. Nat. Cell Biol.26, 581–592 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Borgia, A. et al. Extreme disorder in an ultrahigh-affinity protein complex. Nature555, 61–66 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Neira, J. L. et al. Dynamics of the intrinsically disordered protein NUPR1 in isolation and in its fuzzy complexes with DNA and prothymosin α. Biochim. Biophys. Acta Proteins Proteom.1867, 140252 (2019). [DOI] [PubMed] [Google Scholar]
  • 43.Goretzki, B. et al. Crosstalk between regulatory elements in disordered TRPV4 N-terminus modulates lipid-dependent channel activity. Nat. Commun.14, 4165 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Deutsch, E. W. et al. High-quality peptide evidence for annotating non-canonical open reading frames as human proteins. Preprint at bioRxiv10.1101/2024.09.09.612016 (2024).
  • 45.Das, R. K. & Pappu, R. V. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc. Natl Acad. Sci. USA110, 13392–13397 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Mitra, R. et al. Molecular insights into the interaction between a disordered protein and a folded RNA. Proc. Natl Acad. Sci. USA121, e2409139121 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Chao, T.-H., Rekhi, S., Mittal, J. & Tabor, D. P. Data-driven models for predicting intrinsically disordered protein polymer physics directly from composition or sequence. Mol. Syst. Des. Eng.8, 1146–1155 (2023). [DOI] [PMC free article] [PubMed]
  • 48.Moses, D., Ginell, G. M., Holehouse, A. S. & Sukenik, S. Intrinsically disordered regions are poised to act as sensors of cellular chemistry. Trends Biochem. Sci.48, 1019–1034 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Fossat, M. J. & Pappu, R. V. Q-canonical Monte Carlo sampling for modeling the linkage between charge regulation and conformational equilibria of peptides. J. Phys. Chem. B123, 6952–6967 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Taneja, I. & Lasker, K. Machine-learning-based methods to generate conformational ensembles of disordered proteins. Biophys. J.123, 101–113 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Janson, G. & Feig, M. Transferable deep generative modeling of intrinsically disordered protein conformations. PLoS Comput. Biol.20, e1012144 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Zhu, J. et al. Accurate generation of conformational ensembles for intrinsically disordered proteins with IDPFold. Adv. Sci.10.1002/advs.202511636 (2025). [DOI] [PMC free article] [PubMed]
  • 53.Zhang, O., Liu, Z. H., Forman-Kay, J. D. & Head-Gordon, T. Deep learning of proteins with local and global regions of disorder. Preprint at http://arxiv.org/abs/2502.11326 (2025).
  • 54.Schnapka, V., Morozova, T., Sen, S. & Bonomi, M. Atomic resolution ensembles of intrinsically disordered and multi-domain proteins with Alphafold. Preprint at bioRxiv10.1101/2025.06.18.660298 (2025). [DOI] [PMC free article] [PubMed]
  • 55.Invernizzi, M. et al. Advancing protein ensemble predictions across the order-disorder continuum. Preprint at bioRxiv10.1101/2025.10.18.680935 (2025).
  • 56.Lewis, S. et al. Scalable emulation of protein equilibrium ensembles with generative deep learning. Science389, eadv9817 (2025). [DOI] [PubMed] [Google Scholar]
  • 57.Brotzakis, Z. F., Zhang, S., Murtada, M. H. & Vendruscolo, M. AlphaFold prediction of structural ensembles of disordered proteins. Nat. Commun.16, 1632 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Houston, L., Phillips, M., Torres, A., Gaalswyk, K. & Ghosh, K. Physics-based machine learning trains Hamiltonians and decodes the sequence-conformation relation in the disordered proteome. J. Chem. Theory Comput.20, 10266–10274 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Pesce, F. et al. Design of intrinsically disordered protein variants with diverse structural properties. Sci. Adv.10, eadm9926 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Krueger, R. K., Brenner, M. P. & Shrinivas, K. Generalized design of sequence-ensemble-function relationships for intrinsically disordered proteins. Nat. Comput. Sci.10.1038/s43588-025-00881-y (2025). [DOI] [PubMed]
  • 61.Steinegger, M. et al. MMseqs2: ultra fast and sensitive search and clustering suite. GitHubhttps://github.com/soedinglab/MMseqs2 (2025).
  • 62.Holehouse, A., Lotthammer, J. & Novak, B. STARLING IDR sequence search index and metadata (v1.0.0) [data set]. Zenodo10.5281/zenodo.17342150 (2025).
  • 63.Riback, J. A. et al. Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water. Science358, 238–241 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Kohn, J. E. et al. Random-coil behavior and the dimensions of chemically unfolded proteins. Proc Natl. Acad. Sci. USA101, 12491–12496 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information (4.8MB, pdf)

Supplementary Methods, which describe STARLING’s design, training, and implementation, and Supplementary Figs 1–7.

Reporting Summary (1.1MB, pdf)
Supplementary Tables 1–6 (430.9KB, xlsx)

Supplementary Table 1 contains a breakdown of sequences used for training, test, and validation. Supplementary Table 2 includes numbers of sequences used across different ionic strengths. Supplementary Table 3 contains amino acid sequences for proteins associated with Figure 3 and Extended Data 6a (203 sequences in total). Supplementary Table 4 contains amino acid sequences for proteins associated with Figs 4 and 5, Extended Data Figs 6h, 6i, 7d, 7e, 7f, 7g, 7i, 7j, 7k, 8a, 8b, 8c, 9j, 9k, 9l, 9m, 9n, 9o, 9p, 9q, 9r, Supplementary Fig. 5, and Supplementary Fig. 7 (512 sequences in total). Supplementary Table 5 contains DisProt sequences used in Extended Data 5i, 5j, and 5k. Supplementary Table 6 contains ncORF sequences used in Fig. 4o.

Peer Review File (1,019.6KB, pdf)

Data Availability Statement

The code and data associated with the figures in this article are available on GitHub (https://github.com/holehouse-lab/supportingdata/tree/master/2026/starling_2026). The Zenodo repository for UniRef50 IDR sequence embeddings is available62 (10.5281/zenodo.17342150). Any data missing are available from the corresponding author.

STARLING is fully open source and available on GitHub (https://github.com/idptools/starling/). The documentation for STARLING is available (https://idptools-starling.readthedocs.io). STARLING is also provided as a Colab notebook (https://github.com/idptools/idpcolab/blob/main/STARLING/STARLING_demo.ipynb).


Articles from Nature are provided here courtesy of Nature Publishing Group

RESOURCES