Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2026 Apr 9.
Published before final editing as: Nat Rev Genet. 2026 Mar 9:10.1038/s41576-026-00939-1. doi: 10.1038/s41576-026-00939-1

Gene regulatory networks: from correlative models to causal explanations

Rory J Maizels 1, James Briscoe 1
PMCID: PMC7618986  EMSID: EMS213156  PMID: 41803457

Abstract

Gene regulatory networks (GRNs) explain how the genome controls cellular behaviour and tissue morphogenesis, serving to connect molecular mechanism to functional output. Single cell technologies now provide descriptions of these networks with unprecedented detail, but this is creating a dilemma for the field: we are discovering gene regulatory systems that are too complex for our existing conceptual frameworks. GRNs, which should provide mechanistic explanations, are increasingly reduced to statistical correlations - ‘hairballs’ that fail to capture molecular causation. Here, we explore why this dilemma exists and propose a path forward. We argue that methods in representation learning can be used to model GRNs, without needing to capture every molecular detail. For this framework, we advocate three linked principles: (1) models must be inherently mechanistic with structures grounded in cellular, evolutionary and experimental biology; (2) molecular principles and constraints must be used to reduce the solution space for learning GRN models; and (3) more sophisticated forms of experimental perturbation and synthetic biological engineering are needed to train models and test predictions. By reimagining GRNs through these principles, we can bridge the gap from data abundance to new conceptual understanding.


“No observations on single genes can ever illuminate the overall mechanisms of development of the body plan or of body parts except at the minute and always partial, if not wholly illusory, level of the worm’s eye view.”

– Eric H. Davidson

Introduction

With their systematic mutational screen of segmentation in Drosophila embryos, Nusslein-Volhard and Wieschaus provided a catalysing insight into the molecular and genetic basis of development13. Their work revealed that developmental processes such as body segmentation can be orchestrated by a surprisingly small number of genes; underneath the complexity of the developing organism lies elegant simplicity. Later work showed that developmental genes are highly conserved across species, and that changes in their regulation are responsible for the evolutionary divergence of body plans46. These discoveries laid the foundations for understanding how body plans evolved and how they are encoded in the genome.

In the following decades, we have built a deeper and more detailed understanding of the developmental genetics of tissue formation, leading as far as a call for a Perturbation Cell and Tissue Atlas7 that will extend the genetic screen to its logical conclusion of documenting the role of every gene in every tissue.

Yet in these decades, a gap has opened up between the reductionist approach of studying genetic function gene-by-gene and a mechanistic understanding of the developmental logic that these genes control. This is because genes are not fixed entities with singular roles; they cannot be considered independent units of causality810. Function is not an inherent property of a gene but a context-dependent property of the system in which it is expressed. No gene is an island: all are interdependent, interacting and communicating in complex networks11. It is through the dynamics of these networks, rather than through any single gene, that developmental form emerges.

This view is not new. Over the past half-century, the concept of the gene regulatory network (GRN) has become central to developmental biology9,12,13. Broadly, a GRN is a system of molecular genetic regulators that act in concert to drive particular cellular outcomes. The basic form of a GRN is a set of transcription factors (TFs) that act at the cis-regulatory elements (CREs) of other genes (although this definition can be generalised to accommodate protein modifiers, signalling pathways, the three-dimensional organisation of chromatin14 and so forth). Fundamentally, the GRN concept promises a systems-level perspective on how genes and their regulators govern cellular functions and tissue morphogenesis. It offers a holistic description of regulatory genes operating together (see Box 1). These holistic descriptions can explain behaviours that could not be produced by a single gene, such as oscillations1517, stripes1822 and switch-like responses2325.

Box 1. Organisational Features of Gene Regulatory Networks (GRNs).

Modularity: GRNs can be decomposed into smaller sub-circuits, each responsible for a specific regulatory task. This modular organisation breaks the process of development down into manageable units.

Conserved Sub-Circuits: Similar sub-circuit topologies, although composed of different regulatory genes, perform comparable functions in diverse developmental contexts. Sub-circuits usually involve two or three regulators engaged in mutual feedback and ensure the stability of the regulatory state despite transient initial inputs. For instance, positive feedback sub-circuits, responsible for locking down regulatory states, are common.

Hierarchical Structure: GRNs display a hierarchical organisation, with multiple layers of regulatory interactions controlling developmental processes. This hierarchical structure reflects the sequential nature of development, with each layer building upon the regulatory states established by the preceding layers. At the final layer activation of effector genes are responsible for differentiation and cell biological features of the cells in a tissue. By contrast in upper layers combinatorial cross regulation plays a significant role in shaping the activity in lower layers.

GRN Depth: Deep GRNs involve numerous regulatory steps, and are often found in the initial formation of embryonic territories that require multiple layers of sub-circuits to achieve their developmental outcome. By contrast, shallow GRNs have fewer regulatory transactions between initial inputs and the activation of effector genes. The formation of a typical tissue involves the establishment of a regulatory state in a progenitor field, followed by subdivision into regulatory domains for subparts, while maintaining the overall regulatory state.

Impact of Regulatory Changes on Evolution: Extant GRNs are a product of evolutionary processes. The rewiring of GRNs appears to play a significant role in the evolution of novel structures. In addition to gradual modification, entire GRN circuits can be co-opted into new developmental contexts49, increasing GRN complexity and resulting in the generation of novel features. Modifications to the cis-regulatory elements of genes can modify the timing, location, or level of gene expression, impacting developmental processes.

GRNs are also a crucial evolutionary idea, with fundamental principles of GRN organisation arising from comparisons between species13,2628. It has been suggested that developmental GRNs contain evolutionarily ancient sets of core regulatory genes, known as ‘kernels’13 or ‘Character Identity Networks (ChINs)’27,28 that drive tissue specific developmental programmes. Thus, although the components of a GRN can diverge between species, the systemic logic of the GRN system can remain conserved, so that the GRN acts as the molecular and mechanistic basis of homology29. The human hand and the bird’s wing have distinct morphologies and functions; it is their homologous GRN kernels that reflects their shared evolutionary origin. These evolutionary perspectives demonstrate the importance of a systems-level perspective: GRNs are more than the sum of the molecular relationships they contain. They provide mechanistic control of biological function. They act as a dynamic map from genotype to phenotype8. Building an understanding of these systems that can explain this mechanistic control, this dynamic map of a GRN will be just as important as curating the sets of interactions contained within it.

Causal versus correlative GRNs

The original formulation of the GRN concept was inherently causal, constructed through experimental interventions on the genome. By specifying how the activity of one component changes under specific perturbation of another, causal relationships were established at a molecular level30. Doing this iteratively builds up a set of relationships that collectively create a ‘gene regulatory network.’ In this way, the GRN concept explains how complex cell- and tissue-level phenotypes emerge from molecular processes. It provides a description of how genes behave collectively; a theoretical framework for viewing the cell not just as a bag of genes but a complex, finely tuned control system.

The goal of building a causal understanding of GRNs goes beyond identifying the causal relationships between regulatory components, how the presence of one molecule leads to production of another, and more broadly asks: how does the gene regulatory system cause the cell phenotype? The field of causal learning provides a tool box for defining and analysing interactions in data with causal semantics. But in this instance, we must also consider what interventions on gene regulation will be required, what data must be collected, and how must we represent GRNs for this emergent level of collective causality to become evident.

With the collective behaviour of genes in mind, the advent of genomics technologies that measured expression across thousands of genes simultaneously created considerable excitement in the field of gene regulation modelling31,32. Yet these methods have also highlighted the magnitude of the challenge: genome-wide measurements demonstrate how large developmental GRNs can be, with dozens of transcription factors33,34 interacting across thousands of binding sites35,36.

Single-cell resolution has revealed the true extent of dynamism, stochasticity and heterogeneity in the developing embryo3740.

In light of these findings, the traditional approach of systematic intervention seems infeasible. This has led to the rise of computational GRN models4144. With these ‘statistical GRN’ models, a common thread is that the number of parameters being learnt (the number of possible interactions) is greater than the number of interventionalist observations (for example, the number of genetic knockouts). As such, these models are built to infer relationships from statistical patterns in the data, such as co-variance between genes across cells or samples. They can successfully identify important factors in the network, but fall short of the desired explanatory power of mapping from genotype to phenotype.

This situation presents a problem to the field. For GRN models to provide more than just a list of molecular interactions, we need systems-level understanding that is mechanistic and rooted in the methods of causal discovery. But building such models for more than a handful of genes is extremely challenging. On the other hand, statistical GRNs trained on genomics data can handle the complexity of modern datasets, but do not provide any systems-level understanding of how GRNs control cellular decisions.

Here, we urge a return to the view of GRNs as mechanistic explanations. We acknowledge that ‘big data’ methods such as single-cell genomics will be needed to capture the full complexity of gene regulation; but we require new approaches to distil the complex, high-dimensional picture of these datasets into interpretable mechanisms and clear explanations.

Machine learning methods will be crucial for this effort. In particular, causal representation learning45 techniques can decompose the complexities of large datasets into key variables for the question at hand. Experimental developments offer exciting opportunities, too: synthetic biology offers the chance to understand through engineering, building synthetic networks and cis-regulatory perturbations that expand the space of systems that we can explore. Through these combined efforts we can create new mechanistic models of gene regulation that span the scales from DNA sequence to cellular phenotype and evolutionary dynamics.

Modelling gene regulatory networks

Conventionally, the construction of GRN models is an iterative process of systematically identifying regulatory genes and elements, charting out expression patterns and establishing regulatory interactions through genetic perturbation4648 (Figure 1A). The advent of high-throughput sequencing technologies led to GRN inference algorithms that expedited this process (Figure 1C) by capturing correlative patterns between genes across many samples (or many single cells), circumventing the need for piecewise reconstruction of regulatory connections (Figure 1D). Early methods, such as GENIE350 and MRNET51, were designed for use with microarray or bulk RNA sequencing data, and were benchmarked using synthetic datasets or simple bacterial datasets where a true network structure was known. The performance of these methods was quantified through comparison to these ground truths, with limited application to real biological questions. With single-cell RNA sequencing came a host of new methods, applying techniques from Bayesian modelling52 to dynamical systems theory53, sometimes integrating pseudo-temporal54 or RNA velocity55 information (see Glossary Box3). These methods saw greater biological application, for example identifying novel genes that appeared to be involved in disease states. However, independent benchmarking studies of GRN inference tools revealed widespread poor performance56, with methods often performing no better than simple baselines or random guesswork57. Indeed, a recent study has demonstrated that, even with single-cell resolution, gene expression data alone is insufficient to control for false discoveries in GRN inference58.

Figure 1. Classical versus data-driven GRN construction.

Figure 1

A. A classical pipeline to construct a mechanistic GRN model. The workflow is often highly iterative and experimental, with relevant genes first identified and the relationship between genes experimentally tested. B. An example of a mechanistic GRN model: gene regulation through inductive or repressive interactions (arrows and bars, respectively); genes are functionally organised into modules which are responsible for controlling biological phenotypes. C. The same network represented as a statistical GRN, with interactions represented as arrows. D. A workflow for statistical GRN inference: data is collected and putative networks are constructed from statistical relationships between genes. Edges that are deemed to be indirect are pruned, and additional modalities of data can be incorporated either in training or validation.

Box 3. Glossary.

Coarse graining: a method commonly used in statistical mechanics and chemical simulation methods. Details at a lower scale of resolution (e.g. atomic level) are removed or averaged, such that only features that are essential to preserve macroscopic behaviour at a higher scale (e.g. bio-molecular) are retained. For example, only core geometrical features may be required for protein structure prediction, meaning the explicit atomic structures need not be considered.

Ground truth: the real-world model or outcome of a system, serving as a benchmark against which trained models can be compared to assess performance.

Bayesian modelling: a probabilistic modelling approach that combines prior beliefs (prior distributions) with observed data (likelihoods) to estimate an updated belief (posterior distributions), allowing inference and prediction with robust uncertainty quantification.

Dynamical systems: models of systems whose state evolves through time, often expressed in the form of differential equations. Often used to describe biological systems such as population dynamics, biochemical reactions and gene regulatory networks.

Structural non-identifiability: when wide variation in a model’s parameters produce small changes in model output, such that there is not a unique solution for a model’s fitting to a dataset. This issue is structural because it is not necessarily a consequence of poor data quality, but because of ambiguity or flexibility in the structure of a model; the larger a model becomes, the more probable that it will suffer from structural non-identifiability.

Sloppy models: related to structural non-identifiability, sloppy models are systems biology models that show extreme sensitivity to a small number of parameters, with the majority of parameters having no effect on model performance.

Bijective mapping: a function that creates a one-to-one correspondence between two sets, such that every element in one set maps to one and only one element in the other. If there existed a bijective mapping between the structure and behaviour of a GRN, knowing the structure would necessarily mean knowing the behaviour. Because interactions can have varying strengths and dynamics, the simple structure of interactions in a GRN does not have a one-to-one mapping to the GRN behaviour (a theoretical perfect parameterisation of a GRN model might, but issues of structural non-identifiability may even prevent this from being true).

Multi-scale: models or analysis that represent phenomena across different scales of resolution in time, space or organisation, for example the atomic, molecular, bio-molecular, genetic, cellular, organismal and population scales of biology.

Multi-perspectival: the idea that the representation of a phenomenon may depend on the viewer’s perspective or researcher’s question. The most useful representation of a gene regulatory system may differ for example for questions concerning global cell type control versus questions concerning regulation of specific genes.

Marr’s levels of analysis: a framework introduced by David Marr to describe cognitive and computational systems that breaks these systems into three layers: the computational level, which provides the broadest description of what the system does and why; the algorithmic/representational level, which captures the steps and organisation of the system required to fulfil the computational purpose; and the implementational level, which describes the physical material and substrates used to construct the algorithmic organisation of the system.

Dimension reduction: a class of methods used to reduce the number of variables in a dataset while retaining the major sources of variation in the data. Methods such as PCA, UMAP and t-SNE are commonly used to reduce the dimension of sequencing data, either for visualisation or to create representations that are less noisy or sparse, and thus more tractable for downstream analysis.

Representation learning: related to dimension reduction, a field of machine learning focused on learning meaningful, compact representations of data, rather than using only the variables observed in the original data. The variables in these representations are considered ‘latent’ factors because they are not directly observed. Examples of representation learning models include the variational autoencoder (VAE), a type of deep generative model that encodes data to a latent representation, before decoding this representation back to reconstruct the original data. Like PCA, UMAP and other dimension reduction approaches, when the latent representation of a VAE is lower dimension than the data, this approach can be used to ‘bottleneck’ information, retaining only the most useful variation required to construct the original dataset. Unlike PCA or UMAP, VAEs can easily be modified with additional constraints, allowing biological or dynamical information to be incorporated into the latent features that are learnt.

The advent of multi-omics approaches that simultaneously measure gene expression and chromatin accessibility promised exciting improvements41. Methods such as SCENIC+44, Dictys43 and CellOracle42 use multi-modal single cell data to model GRNs, and have been used to identify key transcription factors and important enhancer regions. In some cases, they have been used to predict the effect of perturbing well-studied differentiation driver genes or to find new important cell fate transcription factors42. Yet, despite the promise of multimodality, these data bring more issues for modelling. Inferring GRNs from chromatin accessibility requires two additional inference tasks for each chromatin region: inferring which TF binds to it and which downstream gene it regulates. Predicting TF binding from sequence alone is not straight-forward59,60, as binding motifs can be highly degenerate (one TF can bind a range of sequences, one motif can be bound by many TFs61). An enhancer can be tens to hundreds of kilobases away from the gene that it regulates62. Perhaps as a result, recent independent benchmarking revealed that multimodal GRN inference methods have limited robustness, high sensitivity to user-supplied parameters, and performed poorly at perturbation-based causal predictions63.

The challenges of modelling GRNs

The basic formulation of a GRN model is a network graph, where each node is a gene, and each edge is an interaction between two genes. The challenge of GRN inference is clear from the nature of these graph structures: if interactions are directional and self-interactions included, the number of possible interactions in a network of n genes is n2, and the number of possible network topologies is 2n2. So, a ten-gene network has 100 possible interactions and more than 1030 possible topologies. Systematically deleting each gene in the network generates only 11 observational conditions (10 knockouts and wild-type) but 100 interaction parameters need to be learnt. And if, after experimentally testing each of these 100 interactions, one is 95% confident of each interaction estimation, this still only gives 0.6% confidence in overall network structure. Making matters worse is the fact that GRNs are dynamic processes with time-dependent interactions and feedback loops64, meaning any static representation of a network (for example, a conventional graph of nodes and edges) would not necessarily capture the behaviour of the network65,66.

Theoretical work in systems biology and dynamical modelling has revealed a number of challenges faced when constructing models of complex systems (such as GRNs) that are only partially observed (as is the case with all biological datasets). Mathematical models can suffer from a phenomenon known as structural non-identifiability67,68, where different sets of model parameters generate the same output, making it impossible to determine the ‘correct’ parameter solution. A related but distinct69 phenomenon is that of ‘sloppy models’70,71, where certain parameters in a model can change by orders of magnitude without impacting the model’s output (Figure 2C).

Figure 2. Challenges of GRN modelling.

Figure 2

A. Different GRN structures (three-node networks) can produce the same pattern of expression and tissue phenotype (represented as box of expression values through time/space). B. Conversely, the same GRN structure can produce different patterns depending on its parametrisations (strength of genegene interactions), contexts (boundary conditions) and initial conditions. C. The challenge of ‘sloppy models’. Top: In these cases, the model responds very sharply to changes in some parameters (stiff parameters) while hardly responding at all to others (sloppy parameters). Bottom: in parameter space, sloppy parameters can be visualised as directions in which the the model output does not change, the contour map is unchanging (in this example from bottom-left to top-right.

These phenomena are related to a more general problem of ‘dynamical equivalence’72,73 where different models generate equivalent dynamics, making the task of identifying the correct model structure intractable. It has been shown that many different GRN structures are capable of generating the same patterning behaviour7476 (Figure 2A). Similarly, slight parameter variations with a single GRN structure can create very different model behaviours15,77 (Figure 2B). One cannot expect a one-to-one mapping between the structure of genetic interactions within a GRN and the behaviour of the GRN as a whole. These problems worsen as the number of parameters in the model increases.

Even an accurate GRN graph model would not provide a complete, objective depiction of the reality of gene regulation. Many aspects are ignored in these models – from spatio-temporal dynamics to epigenetic regulation to transcription factor co-operativity, depending on the model. These aspects are deliberately ignored; they are abstracted with the assumption that they will be sufficiently captured by the parameters of the model. This abstraction is necessary: it makes the system tractable for analysis.

Understanding this act of simplification allows us to ask whether our chosen level of abstraction captures the biological phenomena we wish to study. The choice to abstract molecular details and represent GRNs at the level of genetic interactions is based on the assumption that genes are the fundamental units of causality in cellular systems8. But biological function can be ‘emergent,’ arising from the dynamics and global structure of the GRN system itself7881. Oscillations, switch-like behaviours, and Turing patterns are examples of emergent properties that are not evident from individual genes.

There is nothing to say that the emergent properties of a GRN cannot be described by an explicit and detailed model of every genetic component of the network. But these explicit representations may not provide the most informative depiction of these complex systems82,83. Understanding every individual interaction may not provide a clear explanation of the GRN’s cellular function any more than a complete understanding of the structure of amino acids explains the folding of proteins. Indeed, given the evident challenges associated with constructing GRN models, it is worth considering whether abstracting away the details of genetic interactions would help us to learn more about the functions that GRNs perform in cells.

What molecular organisation might be captured in an ‘emergent property’ of a GRN? It could be as simple as a quantification of the ratio of activities between two genes, rather than of the genes’ activities themselves (for example, the erythroid-myeloid fate decision depends on the stoichiometric balance between GATA1 and PU.184). Other examples of emergent properties/mechanisms could include: many co-expressed genes that are induced in concert to engender a particular phenotype (such as the pigmentation gene module in melanocytes85); sets of related or duplicated components (such as the different Gli proteins that transduce Sonic hedgehog signals86 or the different enhancers that act cooperatively in the α-globin super-enhancer87); components that are molecularly disparate (e.g. DNA sequence and proteins) but act collectively to drive a specific cell-level phenotype (such as the sharp, position-specific stripe of expression driven by the even-skipped stripe 2 system in drosophila88 or the interferon-β enhanceosome that integrates NF-κB and IRF signals in viral infection89); or a sub-circuit, within a wider network of transcription factors, that is responsible for a specific phenotype (such as the four-gene network of Pax6, Olig2, Nkx2-2 and Irx3 that drives ventral spinal cord patterning86).

In each example, molecular components form a larger functional unit, such that we could model the behaviour of the functional unit and abstract molecular details away. This coarse-grained approach could provide more robust models, reveal new forms of biological mechanism. It would provide a multi-perspectival way to study multiscale biological systems, with different abstractions for different questions. The challenge is to find how we can do this in a flexible, generic way, such that a single modelling approach would be suitable for all examples given above.

Representational solutions

The challenge of modelling across different levels of granularity has been addressed in various scientific contexts. For example, in chemistry, coarse-grained modelling is used to produce molecular simulations that abstract away atomic information90, replacing particles with ‘pseudo-particles’ that only retain details that are relevant at the molecular or macromolecular level91. Comparably, Alphafold292 abstracts amino acid chains as ‘triangular gases’, retaining only the core geometrical information required for modelling global protein structure.

But coarse-graining analysis to focus on ‘emergent properties’ can do more than just remove extraneous details. In studying signal processing system, higher levels of analysis can reveal broader design principles and functional structure. In neuroscience, Marr’s levels of analysis93,94 proposes three levels at which information processing systems can be understood. Highest is the computational level, which describes what problem is being solved by the system; the goals, constraints, success criteria. Next is the algorithmic/representational level, which describes how the system achieves its goal; how it processes inputs, builds useful representations, and uses these representations to create outputs. Finally, the implementational level, which describes the physical realisation of the system. Applying these levels of analysis to a radio95, we might say that the computational level of a radio is to deliver an audio programme to a listener. The implementational level will involve antennae, electronics, speakers, buttons, a power supply, and so on. Linking these is the algorithmic/representational level, which might describe how the radio selects a particular frequency with bandpass filtering, reads this signal with frequency discrimination, then processes this signal into data to send to speakers. The computational level describes why one might use a radio, the implementational level describes what a radio is composed of, but understanding how a radio works requires an understanding of the representational level.

This framework can similarly be applied to the signal processing performed by gene regulatory networks96,97. A cell-, tissue- or organism-level description of what processes a GRN controls; the phenotypes it drives, the contexts in which it functions. A molecular level that describes the proteins, cis-regulatory elements, and epigenetic components that construct the GRN; and connecting them, a representational level that describes how these different components organise, how input signals are mapped to output expression programs96, how this input-output mapping creates organismic function out of molecular components (Figure 3A).

Figure 3. Representational descriptions of gene regulatory networks.

Figure 3

A. Information processing description: Marr’s levels of analysis93,94 can be applied to the study of GRNs. An implementational level captures the physical realisation of the system, in the instance of GRNs, this is the explicit description of transcription factors and enhancers that mediate genetic interactions. Above this, a representational level describes the logic of how this physical realisation functions to interpret signals (S1 and S2) and achieve the system’s goals. This is visualised here as logic gate connecting abstract cell-type factors. Finally, a computational level describes the computational process being performed, in this instance the decoding of input signals to form a striped tissue pattern. B. Cellular signal interpretation descriptions: from a cellular perspective, GRNs can be thought of as processes that take signalling dynamics as input and then output cell type proportions. Constructing mechanistic models at the level of signal interpretation could describe the cellular function of GRNs without needing to explicitly model the underlying genetic interactions102104. C. Evolutionary kernel description: GRNs consist of ‘plug-ins’ which are re-usable modules, such as signalling pathways that provide inputs; ‘kernels’ which contain the core functional logic of the GRN, and ‘differentiation batteries’ which are responsible for executing the downstream consequences of the GRN. The different functions of these modules are reflected in their evolutionary dynamics: kernels are highly conserved across species due to being functionally critical to tissue formation (visualised here as the unchanging blue network across species. Plug-ins display higher variability, particular in the contexts in which they are deployed across tissues in the organism (demonstrated here as the changing size and strength of the different modules). Differentiation batteries display the highest level of variability; they do not feed back into the GRN, and so are free to evolve and adapt to provide species-specific outputs. Capturing the evolutionary dynamics of GRN components could thus inform the functional role the components perform in the network.

To build this representational layer, understanding how molecular components are organised is crucial. In this respect, GRNS have been shown to possess structure and organisation26,98: task-specific sub-circuits provide a form of modularity99,100. These sub-circuits are organised in a hierarchical fashion that reflects the evolutionary and functional structure of the GRN101, while the sequential progression of metastable cell-states through development creates another form of hierarchy between cell-state specific sub-circuits. That hierarchy and modularity exist in this functional way, connecting to how GRNs operate to drive cellular decisions, suggests that GRN architectures are reducible and decomposable. Grouping genes into modules and structuring cell states into hierarchies naturally provides a bridge from genetic to cellular scales of function.

Next comes the question of how the system behaves. At the molecular level, behaviour is just the dynamics of components through time, perhaps extending to include interactions between components. But a more systemic, representational idea of GRN behaviour must map the system’s inputs to outputs.

What is the simplest model that can recapitulate outputs based on inputs? And then; how might the activity of components link to this input-output map?

The third question (more relevant to GRNs than radios) is to ask how the system evolved. Throughout evolution, neutral or even mildly deleterious mutations can accrue105,106 and cis-regulatory sequences vary considerably between species107,108, but mutations that impact the GRN’s representational behaviour (and thus computational function) are more likely to produce a dead end. Many developmental GRNs are built around ancient, stable cores of transcription factors (‘kernels’13 or ‘ChINs’27,28) that control key decisions. Feeding into these kernels are signalling input modules (termed ‘plug-ins’ or ‘I/O switches’) which show greater variability across species, though are often repeated across different contexts within an organism13. And responding to kernel activity are ‘differentiation batteries;’ downstream effector genes that execute the network’s output without feeding back into the regulatory system13. These genes show the highest variability across species, as they are not constrained by downstream regulatory logic and can evolve to execute species-specific ‘character states’ of a tissue. Just as the evolutionary dynamics of residues in a protein can provide structural information109111, the evolutionary dynamics of genetic components could describe their role within the wider functional context of the GRN112. Over evolutionary time, developmental systems drift through different network configurations to produce equivalent outputs, rewiring connections while maintaining overall function113. The constraints that guide this drift process, and the correlative patterns that are created by it, could provide valuable prior information for modelling the structure and function of gene regulatory networks in development114116.

Taking a ‘representational’ approach to modelling gene regulatory systems would help to reduce the solution space for models. But the benefit could be more fundamental. The approach moves from asking ‘what is a GRN made of?’ or ‘what is the structure of a GRN?’ to instead asking ‘how does a GRN map inputs to outputs, and how does this achieve the cell’s broader function?’ In doing so, this approach can shift focus towards design principles, cellular function and evolutionary dynamics, connecting the study of GRN structure with fundamental questions of biological purpose and origin.

The parameters of such a representational model of gene regulation would not necessarily capture distinct molecular entities or properties, such as proteins or reaction kinetics. The challenge, then, is to find biological constraints that ensure these models can be trained in a robust, principled way.

Abstract representations and dimensionality reduction

In fields such as single-cell genomics, it is already common practice to visualise biological systems with abstract representations. Dimensionality reduction approaches such as PCA, UMAP117 and t-SNE118 condense the thousands of variables and observations into a more digestible two-dimensional depiction. Dimension reduction is also a common step in machine learning pipelines for statistical tasks such as multi-modal data integration119122, batch correction123125 and perturbation prediction126129.

While reliance on low-dimensional visualisations can introduce distortions into analysis130, the principle motivating these approaches is that the number of variables required to properly describe biological systems is considerably less than the number of features one can measure of it. In other words, biology exists in a lower dimensional space than the full dimension of observable features (known beyond biology as the manifold hypothesis131). Biological features are correlated and interdependent; the system is constrained to fewer degrees of freedom than observed variables. This is necessary feature of organised biological systems: the manifold hypothesis simply implies the presence of organisation.

Low-dimensional representations of biology capture the correlations and patterns in biological systems that result from interactions between components. Building mechanistic models into these representations can thus learn the mechanisms by which these correlations and patterns are generated. Basic implementations of this more mechanistic form of dimension reduction exist already for single-cell data. These include algorithms for describing the gene modules that capture the correlations between genes132, ‘meta-cells’ that capture the coarse-grained patterns of cell states133, and pseudo-time and trajectory analysis tools134 that can model the path of cellular differentiation that explain the observed patterns of cell types in a dataset.

Further to this, methods of ‘causal representation learning45’ aim to disentangle complex phenotypes into distinct biological processes and in doing so to learn causal relationships from the data135144. These approaches have been applied to a wide range of biological contexts, with simulated and real single-cell genomics data, and offer the prospect of more explainable, generalisable models of complex biological systems that are grounded in theory of causal discovery.

These methods have thus far been applied largely to the problem of perturbation prediction: learning the causal effect of genetic and chemical interventions on cells. Further work applying these methods could provide mechanistic representation of how gene regulation systems drive cell and tissue level outcomes during development.

Such a model would look to abstract away molecular details, shifting the onus of these complexities to the abstract parameters of a neural network, freeing the meaningful parameters of the model to learn a smaller number latent causal factors that connect with or drive particular cellular phenotypes (Figure 4).

Figure 4. Towards mechanistic abstract representations.

Figure 4

A. In principal component analysis (PCA), each projected data point is a linear transformation of the original data-point by a transformation matrix C. Accordingly, each component can be described as a linear combinations of variables in the original dataset (for example, genes in RNA sequencing data). B. Dimension reduction methods such as autoencoders, UMAP or t-SNE provide a generalisation of this idea beyond linear mappings, where each data point is mapped through a nonlinear function to a latent representation (to use the nomenclature of autoencoders). Hence, each latent variable can be described as a non-linear function of the variables (genes) in the original dataset. C. A mechanistic adaptation in which the latent representation is constructed from a mechanistic model (fmech) that captures the causal relationships between latent factors than can explain the dynamics of data-points (for example, sequenced cells). This mechanistic model could be a dynamical system describing the time-dependent progression of cells transitioning between cell states. In parallel, data variables (genes) are mapped to latent variables by a function that is subject to biological constraints (fbio), ensuring the mapping is biologically meaningful. The structure of the latent representation and the mapping from variables to latent factors are interdependent, but not equivalent (as with PCA). The mechanistic model can capture the causal relationships driving cell level behaviours, while the latent variable mapping learns how genes connect to these causal relationships.

The challenge here is to constrain the model such that what is being learnt is meaningful for the question. For example, one could enforce latent variables to map to genes of a particular GO terms, or to particulars chromosomes, or that the mapping passes through a dynamical model of transcription. The constraints determine what the model learns. Examples of how GRN models could be constructed from additional forms of data and model structures are described in Box 2. These approaches provide models of gene regulation that are consistent with the observed genetic dynamics, while explaining higher level evolutionary or cellular phenomena.

Box 2. Learning representations of GRNs.

Evolutionary dynamics: by incorporating evolutionary data, and enforcing that each latent variable maps to genes with specific evolutionary dynamics, one could learn a representation of gene regulation that is consistent with the kernel theory of GRN evolution.

Cellular signal processing: by defining a GRN model as receiving only a cell’s signalling environment (for example in an in vitro system) as inputs and producing as output a cell fate, one could learn a representation of gene regulation that captures cellular signal interpretation.

Proliferation vs. differentiation: one could define latent variables that are parameters of an agent-based model that can simulate cellular decisions to either proliferate or differentiate (potentially using lineage tracing data).

Sub-circuit modelling: combining representational models with theory from systems biology could provide methods to treat latent variables as representing particular network motifs or common sub-circuit structures, producing a decomposed GRN model that captures the modular structure of regulatory networks.

If the evolution, cellular function or organismal application of GRNs dictates anything about their form or structure, building these details into our models can serve to reduce the vast solution space. Doing so in a representation learning framework can provide the ‘higher-level’ constraints that allow coarse-grained models to abstract the noise from the signal.

Experimental solutions

Building a high-level understanding of gene regulatory function may not require mapping out every molecular component of a network. But just as the challenge of ‘dynamical equivalence’ means that many different model parametrisations can generate the same dynamics, many different molecular systems could generate equivalent mechanisms. Modelling gene regulation from measurements of molecular components in a way that is robust and generalisable across cell-types, tissues, organisms and species, requires understanding the molecular ‘rules’ that dictate gene regulatory function. The set of possible solutions for a gene regulatory model spans the space of possible molecular instantiations of the system, so to constrain the solution space for GRN models, we must understand what can and cannot happen at a molecular level.

As an example, consider the challenge of learning the ‘cis-regulatory code’ that maps the sequence of an enhancer to its function. An unbiased approach faces an intractably large solution space: there are more possible 200 base-pair sequences than atoms in the universe. Defining biological principles and design constraints reduces this space to that of biologically plausible mechanisms. Just as knowing linguistic structures, such as syllables and phonemes, helps identify valid words in a language, understanding molecular principles organising gene regulatory interactions provides a framework to link basic physical units (DNA bases and transcription factors) to their broader functional meaning (Figure 5A).

Figure 5. Establishing molecular principles governing GRNs.

Figure 5

A. The structure of the English language can help constrain the solution space when determining whether a letter sequence is a valid word: both ‘protein’ and ‘pertino’ follow valid rules of phonetics, while the sequence ‘rnpetoi’ contains the invalid phonotactic cluster ‘rnp’, so can be ruled out. Similarly, understanding the structure of allowed configurations at the molecular level can help constrain the solution space for modelling GRNs (here demonstrated with the example of transcription factor complexes forming at the promoter of a gene). B. Relationships can appear from the interaction between seemingly non-predictive or flexible variables to create a form of ‘emergent rigidity’: here, neither variable 1 nor variable 2 show good correlation with the input variable, however the product of these two variables resolves into a clear linear relationship. C. Variation in biological mechanism (for example cis-regulatory enhancer activity) could be generated through two processes. Top: different enhancer activities/affinities can create dosage control between genes and between contexts, creating a functionally different output between contexts. Bottom: different enhancer activities/affinities can arise as an adaptation to an evolutionary event, such as the duplication of a gene. In this case, the variation does not create functional differences, but compensate for the previous evolutionary event.

‘Big-data’ methods such as single-cell genomics will be vital for describing patterns of gene regulation across contexts. Yet understanding gene regulation requires moving beyond cataloguing and correlating these observations. It will be important to also capture how different regulatory layers connect.

Individual layers of regulation can give the impression of sloppy or noisy mechanisms: TF binding motifs are degenerate61, while enhancers are often redundant145,146. Transcription factors appear to bind thousands of sites in the genome, often in complexes of interchangeable membership, sometimes binding alongside other TFs that they also directly antagonise147,148. The role of other forms of regulation, such as histone modifications, DNA methylation and non-coding RNAs, also appear to display context-dependent function149. This leads to an impression of extreme, almost unlimited flexibility, from which remarkable robustness and precision emerge.

Robustness may be achieved through interactions between regulatory layers. Variations in one layer may be coupled to, complemented, or counteracted by variation in another to produce a form of ‘emergent rigidity’ (Figure 5B). For example, one set of observations might indicate that a gene is regulated by both distal and proximal enhancers, such that the genomic distance of these regulatory elements is not predictive of their relative activities. Another set of observations might find that this gene is contained within a 3D topology that is remodelled between different cell types. Integrating these observations might lead to the finding that the 3D organisation and genomic location of enhancers together resolve into a predictive model of cell-type specific enhancer activity150. Alternatively, one assay might document the sequence variability of motifs bound by a TF, while another might show that the TF induces the expression of targets that are not specific to a single cell type. Together, these findings could resolve into a model of dose-responsive TF binding, where motifs of different affinities drive the expression of different cell type programs (as has previously been shown to be the case for Sox2151). Examining only one layer of regulation may never provide mechanistic understanding: the cis-regulatory code148,152 of gene regulation may only exist in a vague sense when viewed solely from the lens of DNA sequence or TF activity; examining relationships between layers may allow a form of code to emerge more clearly.

For this, multimodal experimental designs linking perturbations in one layer to measurements of another will be valuable. Engineering sequence variation while measuring chromatin conformation with Hi-C; measuring chromatin accessibility changes in response to transcription factor overexpression153156; recording enhancer activities across histone-code perturbations157 - experimentally linking regulatory processes may reveal how the many facets of gene regulation act in concert to constrain cell behaviour.

However, not all regulatory variation is necessarily functional. Variability can provide functional benefit (for example, binding motif degeneracy may allow the intensity of a TF’s effect to be modulated across contexts151) but could also occur through evolutionary chance. For example the duplication of a TF gene could lead to evolutionary changes in cis-regulatory sequences that accommodate two redundant regulators, rather than to removal of the duplicated gene (Figure 5C). Equally, variation can be driven by stochastic processes of mutation and recombination that do not generate strong enough selective forces to be eliminated through evolution158.

To test how structural aspects of a GRN maps to its function, we will need to re-arrange the structure of these networks ourselves.

By altering the composition and combination of cisregulatory elements in a cell, we can begin to alter the strength, polarity or presence of interactions between genes, providing an experimental exploration of how different network structures generate various phenotypes. Work in this direction is ongoing: high-throughput enhancer mutagenesis159, engineered rearrangement of the human genome160 and of enhancer landscapes161, high throughput enhancer knockout162 and TF induction screens163 exemplify this direction. In parallel, methods for designing cis-regulatory sequences towards specific regulatory functions are rapidly maturing, with enhancer screens capable of testing the cell-type specific regulatory activity of thousands of DNA sequences simultaneously164,165; detailed analyses of how structural recombination alters enhancer function166; saturation genome editing methods that measure functional readouts of exhaustively altered regulatory regions167; and machine learning methods for the de novo generation of cell-type and function specific enhancer sequences168170.

These developments point towards a future of synthetic gene regulatory engineering, where cis-regulatory sequences and transcription factors can be edited or introduced to produce targeted alterations to the structure, and thus function, of gene regulatory networks. This, in turn, could create the possibility of designing entirely synthetic gene regulatory networks as objects of investigation, building on existing work in synthetic biology that has created programmable protein circuits171 and protein-level neural networks172 in mammalian cells, or ribocomputing devices173 in bacteria, or genetic logic circuits174 in yeast. Such efforts could provide gold-standard ground truth systems for building and benchmarking GRN modelling frameworks, but more broadly would open up a vast space of GRN structures beyond that of naturally occurring systems.

The experimental progression outlined above moves from outlining molecular rules and the connections between layers of regulation in a network, to re-arranging, re-designing and ultimately constructing new gene regulatory systems. This pathway offers the opportunity to build perturbation frameworks of commensurate sophistication to match the scale of our data-collecting abilities and the complexity of the systems we study. For theoreticians, it offers a particular enticing prospect: the dynamics of gene regulatory networks are, to a certain degree, written into the sequence of the genome. Editing gene regulatory networks through genome engineering thus offers a novel prospect for causal discovery, namely that we can systematically alter the structure of causal interactions in the system we wish to understand.

This opportunity is reminiscent of a recently proposed strategy for understanding cis-regulatory DNA code. This strategy suggested we should ‘hold out the genome:’148 the totality of naturally observed regulatory sequences is only a tiny proportion of the total possible sequence space, so to understand cis-regulatory sequences we must train models on larger libraries of synthetic sequences, expanding beyond what is observed in nature. A similar argument applies to gene regulatory networks: the total space of possible network structures is far larger than is observed in biological systems. By building new synthetic systems and re-engineering the structures of existing ones, we can generate a far deeper and more extensive exploration of gene regulatory networks than is currently possible.

Perspective: Machine learning for biological networks

The rapid progress of biological technologies, from genomics techniques to computational methods, has created considerable excitement regarding our understanding of gene regulatory mechanisms and cellular function (see Box 4 for open challenges). This has led some to call for the construction of ‘virtual cells’175,176 and foundation models that provide ‘universal representations’ of cellular biology7,177.

Box 4. Open Challenges.

Computational:

  • -

    Creating new levels of abstraction for GRN models and linking between levels

  • -

    Developing causal representation learning methods

  • -

    Integrating cellular and evolutionary information into models

  • -

    Extending understanding of common GRN motifs to generalised cases

  • -

    Developing models that explicitly handle the dynamic nature of GRNs

Experimental:

  • -

    Creating reliable ground truth datasets from natural and synthetic systems

  • -

    Identifying molecular principles giving rise to GRNs across contexts

  • -

    Developing methods to understand CREs and gene regulation (single-locus proteomic assays, dynamic spatial assays, 3D genome)

  • -

    TF binding dynamics at single-molecule resolution

  • -

    Developing cis-regulatory manipulations and more sophisticated perturbations

Technical:

  • -

    Improving single-cell methods to capture more conditions per experiment

  • -

    Getting better direct information on TF–enhancer interactions

  • -

    Making multi-modal measurements more affordable and consistent

  • -

    Developing methods to track dynamic changes in GRNs

Conceptual:

  • -

    Understanding how different regulatory layers interact

  • -

    Bridging the gap between molecular interactions and cellular behaviours

  • -

    Determining appropriate levels of abstraction for different questions

  • -

    Understanding how evolutionary constraints shape network architecture

One driving force for this excitement is the success of AlphaFold92,178, arguably the first bona fide biological foundation model. Creating a comparable model for gene regulatory systems requires one to shift scales from the molecular to the cellular. To appreciate the challenges associated with this shift, one must consider the differences between these contexts: crystal structure prediction is a static and clearly definable problem. It benefits from ground truths and robust metrics of success. Moreover, the Protein Data Bank (PDB) remains one of the cleanest and most consistent biological datasets ever constructed. Proteins display structural ‘degeneracy’, where structures contain commonly repeated motifs, such as alpha helices and beta sheets, making the challenge of structure prediction considerably more tractable.

By contrast, ‘cell function’ is a context-dependent concept with no objective definition, and ‘cell states’ can only ever be partially observed. Single-cell sequencing data are noisy, sparse and suffer from batch effects, meaning that collected databases require considerable levels of data processing and transformation to be integrated179. Unlike protein structure, cellular decision making is dynamic and context-dependent, such that seemingly identical cells can behave differently due to unobserved differences (for example clonal dynamics, variation in cell culture conditions, stochasticity of gene expression).

These differences in context highlight key areas where progress is required. For example, protein structure prediction models are not trained on raw data; X-ray diffraction densities are integrated, scaled and processed into atomic coordinates, which are then the input to these models. This process involves the incorporation of biological knowledge, specialised algorithms and human over-sight to produce a standardised representation. Moving genomics ‘harmonisation’ methods, which currently focus just on removing batch effects, towards producing refined and biologically- and biophysically-motivated ‘cell state’ data representations, analogous to atomic coordinate data, may be key to building foundation model-worthy datasets. Moreover, as genomics methods become cheaper and more widely available, we must shift focus from cells-per-experiment towards samples-per-experiment to capture denser sampling of timepoints, signalling contexts, perturbation conditions. When we can record thousands of samples per experiment, rather than thousands of cells, we may start to collect datasets that display the same degeneracy of repeated patterns that has proven so fruitful for protein structure prediction and evolutionary sequence modelling180.

The continued growth of multi-modal sequencing technologies could minimise the problem of cell states only ever being partially observed41, but these methods must provide robust, consistent datasets that can be continually re-used, as we have seen with the PDB. It is notable that the year in which the PDB began was the same year that Minsky and Papert published Perceptrons181, a pessimistic appraisal of the limited utility of neural nets as statistical models. In this year (1969), deep learning did not exist, the most sophisticated computers had only kilo-bytes of memory, and there was no sense that PDB would provide the data for a computational solution to protein structure. We must similarly plan to generate datasets with the size, scope and quality to be used for years to come with computational methods that do not yet exist.

Even as we construct robust datasets and clear modelling objectives to replicate the environment of AlphaFold, we must recognise the fundamental differences between protein structure prediction and GRN modelling. AlphaFold is a predictive method; the goal is not to learn the biophysical principles that govern protein folding, but to predict structure from amino acid sequence. The objective of a GRN model should not only be to predict the phenotypic consequence of a particular molecular genetic state, but also to learn the principles that connect these two scales.

A black-box model predicting cell phenotypes is insufficient. Interpretability is required. The challenge is that our conception of interpretability in biology exists largely at the molecular level: proteins and genes do things, they are the de facto mechanistic units in the cell8. Building an interpretable understanding at a systems-level will require new conceptual frameworks that define what a meaningful systems-level mechanism can look like.

Such frameworks will need to exploit the organisational features of GRNs, such as their hierarchy and modularity. Unlike protein folding, GRNs may possess a ‘representational’ layer that captures not just the molecular implementation, but the logic of how this implementation organises in response to inputs to deliver outputs. Parsing this representational logic has the potential to reveal new design principles, entirely new forms of mechanism, and new views of how information is controlled in biology.

These features of GRNs, hierarchy, modularity, DNA rules, regulatory layers, point us towards how GRN models could be learnt. Hierarchy and modularity imply that GRNs are reducible, allowing molecular complexity to be abstracted away. If rules exist in DNA sequence, then sequence editing can redesign the rules, allowing exploration of a vast space of re-engineered systems. The organisation that emerges from interactions between different regulatory layers may explain how robust cell fate decisions arise from seemingly noisy molecular processes. Evolutionary analysis can discriminate between historically contingent patterns and fundamental rules of regulatory logic.

Our current view of the GRN – a static graph of lines and nodes – provides limited insight into the cellular function of regulatory networks. This fact is in stark contrast to the original formulation of GRNs as causal molecular explanations26. But we are well positioned to return to a mechanistic view. Single-cell genomics can measure biological phenotypes at huge scale and high resolution. Machine learning can convert these data into simpler representations. By building cellular and evolutionary constraints into these representations, we can distil the complexity of gene regulatory systems into the core logic of developmental processes. Ensuring these models are meaningful will require experimental developments that set out the syntax and structure of molecular mechanisms, both in the principles of how they operate across contexts and the relationships that govern how different mechanisms interact. Synthetic biology methods that can build and manipulate gene regulatory systems will be transformative for this effort, providing a depth of understanding that would be impossible through examination of natural regulatory systems alone. Our goal in gathering more detailed datasets and building more sophisticated models should be to dismantle the complexity of gene regulatory networks and reveal the underlying design principles of developmental logic.

Acknowledgements

We are grateful to Titus Brown, James DiFrisco, Doug Erwin, Fabian Fröhlich, Leopold Parts, Pau Badia-i-Mompel, and members of the Briscoe Lab for their constructive comments. This work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK, the UK Medical Research Council and Wellcome Trust (all under FC001051)and by the Wellcome Trust (220379/D/20/Z).

References

  • 1.Nüsslein-Volhard C, Wieschaus E, Kluding H. Mutations affecting the pattern of the larval cuticle inDrosophila melanogaster : I. Zygotic loci on the second chromosome. Wilehm Roux Arch Dev Biol. 1984 Sept;193:267–282. doi: 10.1007/BF00848156. [DOI] [PubMed] [Google Scholar]
  • 2.Jürgens G, Wieschaus E, Nüsslein-Volhard C, Kluding H. Mutations affecting the pattern of the larval cuticle inDrosophila melanogaster : II. Zygotic loci on the third chromosome. Wilehm Roux Arch Dev Biol. 1984 Sept;193:283–295. doi: 10.1007/BF00848157. [DOI] [PubMed] [Google Scholar]
  • 3.Wieschaus E, Nüsslein-Volhard C, Jürgens G. Mutations affecting the pattern of the larval cuticle inDrosophila melanogaster : III. Zygotic loci on the X-chromosome and fourth chromosome. Wilehm Roux Arch Dev Biol. 1984 Sept;193:296–307. doi: 10.1007/BF00848158. [DOI] [PubMed] [Google Scholar]
  • 4.Krumlauf R. Hox genes in vertebrate development. Cell. 1994 July;78:191–201. doi: 10.1016/0092-8674(94)90290-9. [DOI] [PubMed] [Google Scholar]
  • 5.Shubin N, Tabin C, Carroll S. Deep homology and the origins of evolutionary novelty. Nature. 2009 Feb;457:818–23. doi: 10.1038/nature07891. [DOI] [PubMed] [Google Scholar]
  • 6.Carroll SB. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell. 2008 July;134:25–36. doi: 10.1016/j.cell.2008.06.030. [DOI] [PubMed] [Google Scholar]
  • 7.Rood JE, Hupalowska A, Regev A. Toward a foundation model of causal cell and tissue biology with a Perturbation Cell and Tissue Atlas. Cell. 2024 Aug;187:4520–4545. doi: 10.1016/j.cell.2024.07.035. [DOI] [PubMed] [Google Scholar]
  • 8.DiFrisco J, Jaeger J. Genetic Causation in Complex Regulatory Systems: An Integrative Dynamic Perspective. Bioessays. 2020 June;42:e1900226. doi: 10.1002/bies.201900226. [DOI] [PubMed] [Google Scholar]
  • 9.Davidson EH. In: Genomic Regulatory Systems. Davidson EH, editor. Academic Press; San Diego: 2001. pp. 1–23. [Google Scholar]
  • 10.Davidson EH, Peter IS. In: Genomic Control Process. Davidson EH, Peter IS, editors. Academic Press; Oxford: 2015. pp. 41–77. [Google Scholar]
  • 11.Mitchell KJ. The genetics of brain wiring: from molecule to mind. PLoS Biol. 2007 Apr;5:e113. doi: 10.1371/journal.pbio.0050113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Davidson EH, Erwin DH. Gene regulatory networks and the evolution of animal body plans. Science. 2006 Feb;311:796–800. doi: 10.1126/science.1113832. [DOI] [PubMed] [Google Scholar]
  • 13.Davidson EH, Erwin DH. Gene regulatory networks and the evolution of animal body plans. Science. 2006 Feb;311:796–800. doi: 10.1126/science.1113832. [DOI] [PubMed] [Google Scholar]
  • 14.Willemin A, et al. Epigenetic Regulatory Layers in the 3D Nucleus. Molecular Cell. 2024;84:415–428. doi: 10.1016/j.molcel.2023.12.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Perez-Carrasco R, et al. Combining a Toggle Switch and a Repressilator within the AC-DC Circuit Generates Distinct Dynamical Behaviors. Cell Syst. 2018 Apr;6:521–530.:e3. doi: 10.1016/j.cels.2018.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Pourquié O, Goldbeter A. Segmentation clock: insights from computational models. Curr Biol. 2003 Aug;13:R632–4. doi: 10.1016/s0960-9822(03)00567-0. [DOI] [PubMed] [Google Scholar]
  • 17.Hirata H, et al. Oscillatory expression of the bHLH factor Hes1 regulated by a negative feedback loop. Science. 2002 Oct;298:840–3. doi: 10.1126/science.1074560. [DOI] [PubMed] [Google Scholar]
  • 18.Jaeger J. The gap gene network. Cell Mol Life Sci. 2011 Jan;68:243–74. doi: 10.1007/s00018-010-0536-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Sick S, Reinker S, Timmer J, Schlake T. WNT and DKK determine hair follicle spacing through a reaction-diffusion mechanism. Science. 2006 Dec;314:1447–50. doi: 10.1126/science.1130088. [DOI] [PubMed] [Google Scholar]
  • 20.Briscoe J, Small S. Morphogen rules: design principles of gradient-mediated embryo patterning. Development. 2015 Dec;142:3996–4009. doi: 10.1242/dev.129452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Raspopovic J, Marcon L, Russo L, Sharpe J. Modeling digits. Digit patterning is controlled by a Bmp-Sox9-Wnt Turing network modulated by morphogen gradients. Science. 2014 Aug;345:566–70. doi: 10.1126/science.1252960. [DOI] [PubMed] [Google Scholar]
  • 22.Kondo S, Asal R. A reaction-diffusion wave on the skin of the marine angelfish Pomacanthus. Nature. 1995 Aug;376:765–8. doi: 10.1038/376765a0. [DOI] [PubMed] [Google Scholar]
  • 23.Collier JR, Monk NA, Maini PK, Lewis JH. Pattern formation by lateral inhibition with feedback: a mathematical model of delta-notch intercellular signalling. J Theor Biol. 1996 Dec;183:429–46. doi: 10.1006/jtbi.1996.0233. [DOI] [PubMed] [Google Scholar]
  • 24.Li C, Hong T, Nie Q. Quantifying the landscape and kinetic paths for epithelial-mesenchymal transition from a core circuit. Phys Chem Chem Phys. 2016 July;18:17949–56. doi: 10.1039/c6cp03174a. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Tapscott SJ. The circuitry of a master switch: Myod and the regulation of skeletal muscle gene transcription. Development. 2005 June;132:2685–95. doi: 10.1242/dev.01874. [DOI] [PubMed] [Google Scholar]
  • 26.Peter IS, Davidson EH. Evolution of gene regulatory networks controlling body plan development. Cell. 2011 Mar;144:970–85. doi: 10.1016/j.cell.2011.02.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wagner GP. The developmental genetics of homology. Nat Rev Genet. 2007 June;8:473–9. doi: 10.1038/nrg2099. [DOI] [PubMed] [Google Scholar]
  • 28.Wagner GP. Homology, Genes, and Evolutionary Innovation. Princeton University Press; 2014. Apr, [Google Scholar]
  • 29.DiFrisco J, Love AC, Wagner GP. Character identity mechanisms: a conceptual model for comparative-mechanistic biology. Biology & Philosophy. 2020;35:44. [Google Scholar]
  • 30.Davidson EH, et al. A provisional regulatory gene network for specification of endomesoderm in the sea urchin embryo. Dev Biol. 2002 June;246:162–90. doi: 10.1006/dbio.2002.0635. [DOI] [PubMed] [Google Scholar]
  • 31.Ton MLN, Guibentif C, Göttgens B. Single cell genomics and developmental biology: moving beyond the generation of cell type catalogues. Curr Opin Genet Dev. 2020 Oct;64:66–71. doi: 10.1016/j.gde.2020.05.033. [DOI] [PubMed] [Google Scholar]
  • 32.Tanay A, Regev A. Scaling single-cell genomics from phenomenology to mechanism. Nature. 2017 Jan;541:331–338. doi: 10.1038/nature21350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Delile J, et al. Single cell transcriptomics reveals spatial and temporal dynamics of gene expression in the developing mouse spinal cord. Development. 2019 Mar;146 doi: 10.1242/dev.173807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Rayon T, Maizels RJ, Barrington C, Briscoe J. Single-cell transcriptome profiling of the human developing spinal cord reveals a conserved genetic programme with human-specific features. Development. 2021 Aug;148 doi: 10.1242/dev.199711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Nishi Y, et al. A direct fate exclusion mechanism by Sonic hedgehog-regulated transcriptional repressors. Development. 2015 Oct;142:3286–3293. doi: 10.1242/dev.124636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Peterson KA, et al. Neural-specific Sox2 input and differential Gli-binding affinity provide context and positional information in Shh-directed neural patterning. Genes Dev. 2012 Dec;26:2802–16. doi: 10.1101/gad.207142.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Maizels RJ, Snell DM, Briscoe J. Reconstructing developmental trajectories using latent dynamical systems and time-resolved transcriptomics. Cell Systems. 2024 May;15:411–424.:e9. doi: 10.1016/j.cels.2024.04.004. [DOI] [PubMed] [Google Scholar]
  • 38.Cao J, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019 Feb;566:496–502. doi: 10.1038/s41586-019-0969-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mittnenzweig M, et al. A single-embryo, single-cell time-resolved model for mouse gastrulation. Cell. 2021 May;184:2825–2842.:e22. doi: 10.1016/j.cell.2021.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Pijuan-Sala B, et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature. 2019 Feb;566:490–495. doi: 10.1038/s41586-019-0933-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Badia-I-Mompel P, et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet. 2023 June; doi: 10.1038/s41576-023-00618-5. [DOI] [PubMed] [Google Scholar]
  • 42.Kamimoto K, et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature. 2023 Feb;614:742–751. doi: 10.1038/s41586-022-05688-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Wang L, Trasanidis N, Wu T, et al. Dictys: dynamic gene regulatory network dissects developmental continuum with single-cell multiomics. Nature Methods. 2023;20:1368–1378. doi: 10.1038/s41592-023-01971-3. [DOI] [PubMed] [Google Scholar]
  • 44.González-Blas CB, et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. bioRxiv. 2022 doi: 10.1038/s41592-023-01938-4. eprint: https://www.biorxiv.org/content/early/2022/08/19/2022.08.19.504505.full.pdf https://www.biorxiv.org/content/early/2022/08/19/2022.08.19.504505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Scholkopf B, et al. Towards Causal Representation Learning. 2021 arXiv: 2102.11107. cs.LG. [Google Scholar]
  • 46.Davidson EH, Levine MS. Properties of developmental gene regulatory networks. Proceedings of the National Academy of Sciences of the United States of America. 2008 Dec;105:20063–20066. doi: 10.1073/pnas.0806007105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Davidson EH, et al. A genomic regulatory network for development. Science. 2002 Mar;295:1669–1678. doi: 10.1126/science.1069883. [DOI] [PubMed] [Google Scholar]
  • 48.Balaskas N, et al. Gene regulatory logic for reading the Sonic Hedgehog signaling gradient in the vertebrate neural tube. Cell. 2012 Jan;148:273–84. doi: 10.1016/j.cell.2011.10.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.McQueen E, Rebeiz M. In: Gene Regulatory Networks. Peter IS, editor. Academic Press; 2020. pp. 375–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLOS ONE. 2010;5:e12776. doi: 10.1371/journal.pone.0012776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Meyer PE, Kontos K, Lafitte F, Bontempi G. Information-theoretic inference of large transcriptional regulatory networks. EURASIP Journal on Bioinformatics and Systems Biology. 2007;2007:79879. doi: 10.1155/2007/79879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Sanchez-Castillo M, Blanco D, Tienda-Luna M, Carrion MC, Huang Y. A Bayesian framework for the inference of gene regulatory networks from time and pseudo-time series data. Bioinformatics. 2018 Mar;34:964–970. doi: 10.1093/bioinformatics/btx605. [DOI] [PubMed] [Google Scholar]
  • 53.Matsumoto H, et al. SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics. 2017 Apr;33:2314–2321. doi: 10.1093/bioinformatics/btx194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Specht AT, Li JJ. LEAP: constructing gene co-expression networks for single-cell RNA-sequencing data using pseudotime ordering. Bioinformatics. 2017 Mar;33:764–766. doi: 10.1093/bioinformatics/btw729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Qiu X, et al. Inferring Causal Gene Regulatory Networks from Coupled Single-Cell Expression Dynamics Using Scribe. Cell Syst. 2020 Mar;10:265–274.:e11. doi: 10.1016/j.cels.2020.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Pratapa A, Jalihal AP, Law JN, Bharadwaj A, Murali TM. Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data. Nat Methods. 2020 Feb;17:147–154. doi: 10.1038/s41592-019-0690-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Chen S, Mar JC. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics. 2018;19:232. doi: 10.1186/s12859-018-2217-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Kernfeld E, Keener R, Cahan P, Battle A. Transcriptome data are insufficient to control false discoveries in regulatory network inference. Cell Systems. 2024;15:709–724.:e13. doi: 10.1016/j.cels.2024.07.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Wanniarachchi DV, Viswakula S, Wickramasuriya AM. The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes. BMC Bioinformatics. 2024;25:371. doi: 10.1186/s12859-024-05995-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Keilwagen J, Posch S, Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biology. 2019;20:9. doi: 10.1186/s13059-018-1614-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Zandvakili A, Campbell I, Gutzwiller LM, Weirauch MT, Gebelein B. Degenerate Pax2 and Senseless binding motifs improve detection of low-affinity sites required for enhancer specificity. PLoS Genet. 2018 Apr;14:e1007289. doi: 10.1371/journal.pgen.1007289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Lettice LA, et al. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Human Molecular Genetics. 2003 July;12:1725–1735. doi: 10.1093/hmg/ddg180. [DOI] [PubMed] [Google Scholar]
  • 63.Badia-i-Mompel P, et al. Comparison and evaluation of methods to infer gene regulatory networks from multimodal single-cell data. bioRxiv. 2025 [Google Scholar]
  • 64.Maizels RJ. A dynamical perspective: moving towards mechanism in single-cell transcriptomics. Philos Trans R Soc Lond B Biol Sci. 2024 Apr;379:20230049. doi: 10.1098/rstb.2023.0049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Cotterell J, Sharpe J. An atlas of gene regulatory networks reveals multiple three-gene mechanisms for interpreting morphogen gradients. Molecular Systems Biology. 2010;6:425. doi: 10.1038/msb.2010.74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Isalan M. Gene networks and liar paradoxes. Bioessays. 2009 Oct;31:1110–5. doi: 10.1002/bies.200900072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Wieland FG, Hauber AL, Rosenblatt M, Tönsing C, Timmer J. On structural and practical identifiability. Current Opinion in Systems Biology. 2021;25:60–69. [Google Scholar]
  • 68.Raue A, et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009 Aug;25:1923–1929. doi: 10.1093/bioinformatics/btp358. [DOI] [PubMed] [Google Scholar]
  • 69.Chis OT, Villaverde AF, Banga JR, BalsaCanto E. On the relationship between sloppiness and identifiability. Mathematical Biosciences. 2016;282:147–161. doi: 10.1016/j.mbs.2016.10.009. [DOI] [PubMed] [Google Scholar]
  • 70.Gutenkunst RN, et al. Arkin AP, editor. Universally Sloppy Parameter Sensitivities in Systems Biology Models. PLoS Computational Biology. 2007 Oct;3:e189. doi: 10.1371/journal.pcbi.0030189. ISSN: 1553-7358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Transtrum MK, et al. Sloppiness and Emergent Theories in Physics, Biology, and Beyond. 2015 doi: 10.1063/1.4923066. arxiv: 1501.07668. cond-mat.stat-mech https://arxiv.org/abs/1501.07668. [DOI] [PubMed] [Google Scholar]
  • 72.Gábor A, Hangos KM, Banga JR, et al. Reaction network realizations of rational biochemical systems and their structural properties. Journal of Mathematical Chemistry. 2015;53:1657–1686. [Google Scholar]
  • 73.Babtie AC, Kirk P, Stumpf MPH. Topological sensitivity analysis for systems biology. Proceedings of the National Academy of Sciences of the United States of America. 2014;111:18507–18512. doi: 10.1073/pnas.1414026112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Cotterell J, Sharpe J. An atlas of gene regulatory networks reveals multiple three-gene mechanisms for interpreting morphogen gradients. Molecular Systems Biology. 2010 Nov;6:425. doi: 10.1038/msb.2010.74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Schaerli Y, et al. Synthetic circuits reveal how mechanisms of gene regulatory networks constrain evolution. Molecular Systems Biology. 2018;14:e8102. doi: 10.15252/msb.20178102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Otero-Muras I, Perez-Carrasco R, Banga JR, Barnes CP. Automated design of gene circuits with optimal mushroom-bifurcation behavior. iScience. 2023;26:106836. doi: 10.1016/j.isci.2023.106836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Jiménez A, Cotterell J, Munteanu A, Sharpe J. A spectrum of modularity in multi-functional gene circuits. Molecular systems biology. 2017 Apr;13:925. doi: 10.15252/msb.20167347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Bhalla US, Iyengar R. Emergent properties of networks of biological signaling pathways. Science. 1999 Jan;283:381–7. doi: 10.1126/science.283.5400.381. [DOI] [PubMed] [Google Scholar]
  • 79.Yurchenko SB. Is information the other face of causation in biological systems? Biosystems. 2023;229:104925. doi: 10.1016/j.biosystems.2023.104925. [DOI] [PubMed] [Google Scholar]
  • 80.Artime O, De Domenico M. From the origin of life to pandemics: emergent phenomena in complex systems. Philosophical Transactions of the Royal Society A. 2022;380:20200410. doi: 10.1098/rsta.2020.0410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Ellis GF. Top-down causation and emergence: some comments on mechanisms. Interface Focus. 2012 Feb;2:126–140. doi: 10.1098/rsfs.2011.0062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Klein B, Hoel E. The emergence of informative higher scales in complex networks. 2020 doi: 10.1080/19420889.2020.1802914. arxiv: 1907.03902. physics.soc-ph https://arxiv.org/abs/1907.03902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Hoel E. When the Map Is Better Than the Territory. Entropy. 2017 Apr;19:188. doi: 10.3390/e19050188. ISSN: 1099-4300. [DOI] [Google Scholar]
  • 84.Zhang P, et al. Negative cross-talk between hematopoietic regulators: GATA proteins repress PU.1. Proc Natl Acad Sci U S A. 1999 July;96:8705–10. doi: 10.1073/pnas.96.15.8705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Levy C, Khaled M, Fisher DE. MITF: master regulator of melanocyte development and melanoma oncogene. Trends Mol Med. 2006 Sept;12:406–14. doi: 10.1016/j.molmed.2006.07.008. [DOI] [PubMed] [Google Scholar]
  • 86.Frith TJ, Briscoe J, Boezio GL. Academic Press; 2023. https://www.sciencedirect.com/science/article/pii/S0070215323000868 . [Google Scholar]
  • 87.Kassouf MT, et al. The -globin super-enhancer acts in an orientation-dependent manner. Nat Commun. 2025 Jan;16:1033. doi: 10.1038/s41467-025-56380-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Small S, Blair A, Levine M. Regulation of even-skipped stripe 2 in the Drosophila embryo. EMBO J. 1992 Nov;11:4047–57. doi: 10.1002/j.1460-2075.1992.tb05498.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Panne D, Maniatis T, Harrison SC. An atomic model of the interferon-beta enhanceosome. Cell. 2007 June;129:1111–23. doi: 10.1016/j.cell.2007.05.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Kmiecik S, et al. Coarse-Grained Protein Models and Their Applications. Chemical Reviews. 2016;116:7898–7936. doi: 10.1021/acs.chemrev.6b00163. [DOI] [PubMed] [Google Scholar]
  • 91.Ingólfsson HI, et al. The power of coarse graining in biomolecular simulations. Wiley Interdisciplinary Reviews: Computational Molecular Science. 2014 May;4:225–248. doi: 10.1002/wcms.1169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Marr D. Vision: A Computational Approach. MIT Press; Cambridge, MA: 1982. [Google Scholar]
  • 94.Marr DC, Poggio T. From Understanding Computation to Understanding Neural Circuitry tech rep. Massachusetts Institute of Technology Artificial Intelligence Laboratory; 1976. pp. 1–22. [Google Scholar]
  • 95.Lazebnik Y. Can a biologist fix a radio?–Or, what I learned while studying apoptosis. Cancer Cell. 2002 Sept;2:179–82. doi: 10.1016/s1535-6108(02)00133-2. [DOI] [PubMed] [Google Scholar]
  • 96.Pezzotta A, Briscoe J. Optimal control of gene regulatory networks for morphogen-driven tissue patterning. Cell Syst. 2023 Nov;14:940–952.:e11. doi: 10.1016/j.cels.2023.10.004. [DOI] [PubMed] [Google Scholar]
  • 97.Tkačik G, Wolde PRT. Information Processing in Biochemical Networks. Annu Rev Biophys. 2025 May;54:249–274. doi: 10.1146/annurev-biophys-060524-102720. [DOI] [PubMed] [Google Scholar]
  • 98.Hatleberg WL, Hinman VF. Modularity and hierarchy in biological systems: Using gene regulatory networks to understand evolutionary change. Curr Top Dev Biol. 2021;141:39–73. doi: 10.1016/bs.ctdb.2020.11.004. [DOI] [PubMed] [Google Scholar]
  • 99.Alon U. Network motifs: theory and experimental approaches. Nat Rev Genet. 2007 June;8:450–61. doi: 10.1038/nrg2102. [DOI] [PubMed] [Google Scholar]
  • 100.Lorenz DM, Jeng A, Deem MW. The emergence of modularity in biological systems. Phys Life Rev. 2011 June;8:129–60. doi: 10.1016/j.plrev.2011.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Erwin DH, Davidson EH. The evolution of hierarchical gene regulatory networks. Nature Reviews Genetics. 2009;10:141–148. doi: 10.1038/nrg2499. [DOI] [PubMed] [Google Scholar]
  • 102.Sáez M, et al. Statistically derived geometrical landscapes capture principles of decision-making dynamics during cell fate transitions. Cell Syst. 2022 Jan;13:12–28.:e3. doi: 10.1016/j.cels.2021.08.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Sáez M, Briscoe J, Rand DA. Dynamical landscapes of cell fate decisions. Interface Focus. 2022 Aug;12:20220002. doi: 10.1098/rsfs.2022.0002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Rand DA, Raju A, Sáez M, Corson F, Siggia ED. Geometry of gene regulatory dynamics. Proceedings of the National Academy of Sciences of the United States of America. 2021 Sept;118:e2109729118. doi: 10.1073/pnas.2109729118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Ohta T. Slightly deleterious mutant substitutions in evolution. Nature. 1973 Nov;246:96–8. doi: 10.1038/246096a0. [DOI] [PubMed] [Google Scholar]
  • 106.Lynch M, Conery JS. The origins of genome complexity. Science. 2003 Nov;302:1401–4. doi: 10.1126/science.1089370. [DOI] [PubMed] [Google Scholar]
  • 107.Wray GA. The evolutionary significance of cis-regulatory mutations. Nat Rev Genet. 2007 Mar;8:206–16. doi: 10.1038/nrg2063. [DOI] [PubMed] [Google Scholar]
  • 108.Schmidt D, et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science. 2010 May;328:1036–40. doi: 10.1126/science.1186176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Marks DS, et al. Protein 3D Structure Computed from Evolutionary Sequence Variation. PLOS ONE. 2011 Dec;6:1–20. doi: 10.1371/journal.pone.0028766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nature Biotechnology. 2012;30:1072–1080. doi: 10.1038/nbt.2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Hopf TA, et al. Mutation effects predicted from sequence co-variation. Nature Biotechnology. 2017;35:128–135. doi: 10.1038/nbt.3769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Hecker N, et al. Enhancer-driven cell type comparison reveals similarities between the mammalian and bird pallium. Science. 2025;387:eadp3957. doi: 10.1126/science.adp3957. [DOI] [PubMed] [Google Scholar]
  • 113.Halfon MS. Perspectives on Gene Regulatory Network Evolution. Trends in Genetics. 2017;33:436–447. doi: 10.1016/j.tig.2017.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Booth H, Hadjivasiliou Z. Gene network architecture, mutation and selection collectively drive developmental pattern evolvability and predictability. bioRxiv. 2024 Dec; doi: 10.1101/2024.12.23. [DOI] [Google Scholar]
  • 115.Pavlicev M, Cheverud JM, Wagner GP. A model of developmental evolution: selection, pleiotropy and compensation. Trends in Ecology & Evolution. 2012;27:316–322. doi: 10.1016/j.tree.2012.01.016. [DOI] [PubMed] [Google Scholar]
  • 116.McColgan Á, DiFrisco J. Understanding developmental system drift. Development. 2024;151:dev203054. doi: 10.1242/dev.203054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.McGinnis CS, et al. MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat Methods. 2019 July;16:619–626. doi: 10.1038/s41592-019-0433-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Van der Maaten LJP, Hinton GE. Visualizing Data Using t-SNE. Journal of Machine Learning Research. 2008 Nov;9:2579–2605. https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf . [Google Scholar]
  • 119.Ternes L, et al. A multi-encoder variational autoencoder controls multiple transformational features in single-cell image analysis. Communications Biology. 2022;5:255. doi: 10.1038/s42003-022-03218-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Wani SA, Khan SA, Quadri S. scJVAE: A novel method for integrative analysis of multimodal single-cell data. Computers in Biology and Medicine. 2023;158:106865. doi: 10.1016/j.compbiomed.2023.106865. [DOI] [PubMed] [Google Scholar]
  • 121.Kalafut NC, Huang X, Wang D. Joint variational autoencoders for multimodal imputation and embedding. Nature Machine Intelligence. 2023 June;5:631–642. doi: 10.1038/s42256-023-00663-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Gong B, Zhou Y, Purdom E. Cobolt: integrative analysis of multimodal single-cell sequencing data. Genome Biology. 2021;22:351. doi: 10.1186/s13059-021-02556-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Danino R, Nachman I, Sharan R. Batch correction of single-cell sequencing data via an autoencoder architecture. Bioinform Adv. 2024;4:vbad186. doi: 10.1093/bioadv/vbad186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Tran HTN, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biology. 2020;21:12. doi: 10.1186/s13059-019-1850-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Zhang Z, Zhao X, Bindra M, et al. scDisInFact: disentangled learning for integration and prediction of multi-batch multi-condition single-cell RNA-sequencing data. Nature Communications. 2024;15:912. doi: 10.1038/s41467-024-45227-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Kana O, et al. Generative modeling of single-cell gene expression for dose-dependent chemical perturbations. Patterns (New York) 2023 Aug;4:100817. doi: 10.1016/j.patter.2023.100817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Yang Y, Seninge L, Wang Z, et al. The manatee variational autoencoder model for predicting gene expression alterations caused by transcription factor perturbations. Scientific Reports. 2024;14:11794. doi: 10.1038/s41598-024-62620-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Tang Z, Zhou M, Zhang K, Song Q. scPerb: Predict single-cell perturbation via style transfer-based variational autoencoder. Journal of Advanced Research. 2024 doi: 10.1016/j.jare.2024.10.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Rampášek L, Hidru D, Smirnov P, Haibe-Kains B, Goldenberg A. Dr.VAE: improving drug response prediction via modeling of drug perturbation effects. Bioinformatics. 2019 Mar;35:3743–3751. doi: 10.1093/bioinformatics/btz158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Chari T, Pachter L. The specious art of singlecell genomics. PLoS Computational Biology. 2023;19:e1011288. doi: 10.1371/journal.pcbi.1011288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Gorban AN, Tyukin IY. Blessing of dimensionality: mathematical foundations of the statistical physics of data. Philosophical Transactions of the Royal Society A. 2018;15:20170237. doi: 10.1098/rsta.2017.0237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Kunes RZ, Walle T, Land M, et al. Supervised discovery of interpretable gene programs from single-cell data. Nature Biotechnology. 2024;42:1084–1095. doi: 10.1038/s41587-023-01940-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Li Y, et al. MetaQ: fast, scalable and accurate meta-cell inference via single-cell quantization. Nature Communications. 2025;16:1205. doi: 10.1038/s41467-025-56424-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nature Biotechnology. 2019;37:547–554. doi: 10.1038/s41587-019-0071-9. [DOI] [PubMed] [Google Scholar]
  • 135.Lopez R, et al. Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling. 2023 arxiv: 2211.03553. q-bio.GN https://arxiv.org/abs/2211.03553. [Google Scholar]
  • 136.Zhang J, et al. Identifiability Guarantees for Causal Disentanglement from Soft Interventions. 2023 arxiv: 2307.06250. stat.ML https://arxiv.org/abs/2307.06250. [Google Scholar]
  • 137.Gao Y, Dong K, Shan C, Li D, Liu Q. Causal disentanglement for single-cell representations and controllable counterfactual generation. Nature Communications. 2025;16:6775. doi: 10.1038/s41467-025-62008-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.An S, et al. scCausalVI disentangles single-cell perturbation responses with causality-aware generative model. bioRxiv. 2025 doi: 10.1016/j.cels.2025.101443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Bereket M, Karaletsos T. Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder. 2024. arxiv: 2311.02794. stat.ML https://arxiv.org/abs/2311.02794.
  • 140.Baek S, et al. CRADLE-VAE: Enhancing Single-Cell Gene Perturbation Modeling with Counter-factual Reasoning-based Artifact Disentanglement. 2024 arxiv: 2409.05484. cs.LG https://arxiv.org/abs/2409.05484. [Google Scholar]
  • 141.Mao H, et al. Learning Identifiable Factorized Causal Representations of Cellular Responses. 2024 arxiv: 2410.22472. cs.LG. https://arxiv.org/abs/2410.22472. [Google Scholar]
  • 142.Tejada-Lapuerta A, et al. Causal machine learning for single-cell genomics. 2023 doi: 10.1038/s41588-025-02124-2. arxiv: 2310.14935. cs.LG. https://arxiv.org/abs/2310.14935. [DOI] [PubMed] [Google Scholar]
  • 143.Aliee H, Kapl F, Hediyeh-Zadeh S, Theis FJ. Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity. 2023 arxiv: 2307.00558. cs.LG https://arxiv.org/abs/2307.00558. [Google Scholar]
  • 144.Lopez R, Hütter JC, Pritchard JK, Regev A. Large-Scale Differentiable Causal Discovery of Factor Graphs. 2022 arxiv: 2206.07824. stat.ML https://arxiv.org/abs/2206.07824. [Google Scholar]
  • 145.Kvon EZ, Waymack R, Gad M, Wunderlich Z. Enhancer redundancy in development and disease. Nat Rev Genet. 2021 May;22:324–336. doi: 10.1038/s41576-020-00311-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Hong JW, Hendrix DA, Levine MS. Shadow enhancers as a source of evolutionary novelty. Science. 2008 Sept;321:1314. doi: 10.1126/science.1160631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Thompson JJ, et al. Extensive co-binding and rapid redistribution of NANOG and GATA6 during emergence of divergent lineages. Nat Commun. 2022 July;13:4257. doi: 10.1038/s41467-022-31938-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.De Boer CG, Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature. 2024;625:41–50. doi: 10.1038/s41586-023-06661-w. [DOI] [PubMed] [Google Scholar]
  • 149.Gibney ER, Nolan CM. Epigenetics and gene expression. Heredity. 2010;105:4–13. doi: 10.1038/hdy.2010.54. [DOI] [PubMed] [Google Scholar]
  • 150.Bolt CC, Duboule D. The regulatory landscapes of developmental genes. Development. 2020 Feb;147:dev171736. doi: 10.1242/dev.171736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151.Blassberg R, et al. Sox2 levels regulate the chromatin occupancy of WNT mediators in epiblast progenitors responsible for vertebrate body formation. Nat Cell Biol. 2022 May;24:633–644. doi: 10.1038/s41556-022-00910-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Kim S, et al. Deciphering the multi-scale, quantitative cis-regulatory code. Molecular Cell. 2023;83:373–392. doi: 10.1016/j.molcel.2022.12.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153.Mayran A, et al. Pioneer and nonpioneer factor co-operation drives lineage specific chromatin opening. Nature Communications. 2019;10:3807. doi: 10.1038/s41467-019-11791-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Liu BB, et al. An automated ATAC-seq method reveals sequence determinants of transcription factor dose response in the open chromatin. bioRxiv. 2025 [Google Scholar]
  • 155.Li D, et al. Chromatin Accessibility Dynamics during iPSC Reprogramming. Cell Stem Cell. 2017;21:819–833.:e6. doi: 10.1016/j.stem.2017.10.012. [DOI] [PubMed] [Google Scholar]
  • 156.Chronis C, et al. Cooperative Binding of Transcription Factors Orchestrates Reprogramming. Cell. 2017;168:442–459.:e20. doi: 10.1016/j.cell.2016.12.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Narita T, et al. Acetylation of histone H2B marks active enhancers and predicts CBP/p300 target genes. Nature Genetics. 2023;55:679–692. doi: 10.1038/s41588-023-01348-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158.Lynch M. The evolution of genetic networks by non-adaptive processes. Nature Reviews Genetics. 2007;8:803–813. doi: 10.1038/nrg2192. [DOI] [PubMed] [Google Scholar]
  • 159.Kosicki M, Zhang B, Hecht V, et al. In vivo mapping of mutagenesis sensitivity of human enhancers. Nature. 2025;643:839–846. doi: 10.1038/s41586-025-09182-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 160.Koeppel J, et al. Randomizing the human genome by engineering recombination between repeat elements. Science. 2025;387:eado3979. doi: 10.1126/science.ado3979. [DOI] [PubMed] [Google Scholar]
  • 161.Koeppel J, et al. Resolution of a human superenhancer by targeted genome randomisation. bioRxiv. 2025 eprint: https://www.biorxiv.org/content/early/2025/01/14/2025.01.14.632548.full.pdf https://www.biorxiv.org/content/early/2025/01/14/2025.01.14.632548. [Google Scholar]
  • 162.Gasperini M, et al. A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens. Cell. 2019;176:377–390.:e19. doi: 10.1016/j.cell.2018.11.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 163.Liu W, et al. Dissecting the impact of transcription factor dose on cell reprogramming heterogeneity using scTF-seq. Nature Genetics. 2025;57:2522–2535. doi: 10.1038/s41588-025-02343-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Frömel R, et al. Synthetic enhancers reveal design principles of cell state specific regulatory elements in hematopoiesis. bioRxiv. 2024 [Google Scholar]
  • 165.Lalanne JB, et al. Multiplex profiling of developmental cis-regulatory elements with quantitative single-cell expression reporters. Nat Methods. 2024 June;21:983–993. doi: 10.1038/s41592-024-02260-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 166.Cornwall-Scoones J, et al. Predictable Engineering of Signal-Dependent Cis-Regulatory Elements. bioRxiv. 2025 eprint: https://www.biorxiv.org/content/early/2025/03/07/2025.03.07.642002.full.pdf https://www.biorxiv.org/content/early/2025/03/07/2025.03.07.642002. [Google Scholar]
  • 167.Buckley M, et al. Saturation genome editing maps the functional spectrum of pathogenic VHL alleles. Nat Genet. 2024 July;56:1446–1455. doi: 10.1038/s41588-024-01800-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 168.Zhang P, et al. Deep flanking sequence engineering for efficient promoter design using DeepSEED. Nat Commun. 2023 Oct;14:6309. doi: 10.1038/s41467-023-41899-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169.Taskiran II, et al. Cell-type-directed design of synthetic enhancers. Nature. 2024 Feb;626:212–220. doi: 10.1038/s41586-023-06936-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 170.Gosai SJ, et al. Machine-guided design of cell-type-targeting cis-regulatory elements. Nature. 2024 Oct; doi: 10.1038/s41586-024-08070-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 171.Gao XJ, Chong LS, Kim MS, Elowitz MB. Programmable protein circuits in living cells. Science. 2018;361:1252–1258. doi: 10.1126/science.aat5062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172.Chen Z, et al. A synthetic protein-level neural network in mammalian cells. Science. 2024;386:1243–1250. doi: 10.1126/science.add8468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 173.Green AA, et al. Complex cellular logic computation using ribocomputing devices. Nature. 2017;548:117–121. doi: 10.1038/nature23271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 174.Chen Y, et al. Genetic circuit design automation for yeast. Nature Microbiology. 2020;5:1349–1360. doi: 10.1038/s41564-020-0757-2. [DOI] [PubMed] [Google Scholar]
  • 175.Bunne C, et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities. Cell. 2024 Dec;187:7045–7063. doi: 10.1016/j.cell.2024.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 176.Johnson GT, et al. Building the next generation of virtual cells to understand cellular biology. Biophys J. 2023 Sept;122:3560–3569. doi: 10.1016/j.bpj.2023.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 177.Heimberg G, et al. A cell atlas foundation model for scalable search of similar human cells. Nature. 2024 Nov; doi: 10.1038/s41586-024-08411-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 178.Abramson J, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024 June;630:493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 179.Booeshaghi AS, Galvez-Merchán Á, Pachter L. Algorithms for a Commons Cell Atlas. bioRxiv. 2024 [Google Scholar]
  • 180.Brixi G, et al. Genome modeling and design across all domains of life with Evo 2. bioRxiv. 2025 doi: 10.1038/s41586-026-10176-5. [DOI] [PubMed] [Google Scholar]
  • 181.Minsky M, Papert S. Perceptrons. MIT Press; Cambridge, MA: 1969. [Google Scholar]

RESOURCES