Abstract
Advances in methods of biological data collection are driving the rapid growth of comprehensive datasets across clinical and research settings. These datasets provide the opportunity to monitor biological systems in greater depth and at finer time steps than was achievable in the past. Classically, biomarkers are used to represent and track key aspects of a biological system. Biomarkers retain utility even with the availability of large datasets, since monitoring and interpreting changes in a vast number of molecules remains impractical. However, given the large number of molecules in these datasets, a major challenge is identifying the best biomarkers for a particular setting. Here, we apply principles of observability theory to establish a general methodology for biomarker selection. We demonstrate that observability measures effectively identify biologically meaningful sensors in a range of time series transcriptomics data. Motivated by the practical considerations of biological systems, we introduce the method of dynamic sensor selection (DSS) to maximize observability over time, thus enabling observability over regimes where system dynamics themselves are subject to change. This observability framework is flexible, capable of modeling gene expression dynamics and using auxiliary data, including chromosome conformation, to select biomarkers. Additionally, we demonstrate the applicability of this approach beyond genomics by evaluating the observability of neural activity. These applications demonstrate the utility of observability-guided biomarker selection for across a wide range of biological systems, from agriculture and biomanufacturing to neural applications and beyond.
Keywords: observability, biomarkers, dynamic sensor selection, sensor selection, data driven observability
1. Introduction
Monitoring the state of a cell or tissue is experimentally and computationally challenging. Recently developed on-demand sequencing technologies, including live single-cell sequencing and adaptive sampling, are increasing the accessibility of high-dimensional, high-frequency time series genomics data [10, 63]. These technologies are shifting the bottleneck in monitoring biological systems from the acquisition to the synthesis of data - posing a challenge for the selection of biomarkers that represent a specific biological state in clinical and research settings [6, 44, 19]. Observability theory - an engineering framework for sensor selection - provides a framework to uncover biomarkers and offers an approach to analyze and interpret vast biological datasets.
Systems theory models the genome as a dynamical system, where temporal changes of gene expression and chromatin structure are described by the differential equation:
| (1) |
The cells state is described by a vector , environmental influences and perturbations are represented by a control signal , and the function models the dynamics with parameters . Observability involves an additional measurement operator , which maps the system state to available data with the equation:
| (2) |
Here, is the number of measurements collected at each time point, which is often significantly smaller than the dimension of the relevant system state. The system represented by the pair of equations modeling dynamics and measurement (, ) is observable when data determine the system state . Identifying a set of biomarkers to render a system observable is equivalent to selecting measurements with that maximize our ability to determine a biological state throughout time.
Biological systems are highly complex, with individual cells containing millions of proteins—a scale that surpasses the typical focus of systems and control theory applications, such as jet engines or communication networks [14]. While observability has been extensively studied in mathematical biology [1, 2, 61, 62], its application to biomarker detection represents a new frontier that must account for the noisy, sparse, and high dimensional data in biology. Metabolic and gene regulatory networks have been analyzed using structural observability, an approach that prioritizes scalability over precision ( and ) due to limited consideration of parameters learned from biological data [31, 35, 34]. Using time series transcriptomics data, Hasnain et al. employed data-driven modeling and observability optimization to learn dynamics and design biomakers for pesticide detection [20]. Still, many practicalities of biological systems and data remain unaddressed by observability theory.
Biomarker discovery has previously relied on two advantages of biological systems: (1) a wealth of domain knowledge that exists independently of mathematical models ( and ), and (2) an increasing array of high-dimensional, multimodal data and experimental techniques. To bridge the gap between observability theory and biomarker discovery, we integrate observability into the biomarker detection problem as follows (fig. 1):
Figure 1: Framework for applying observability to biological data.
(Box 1) Models of biological systems are constructed from experimental data, and sensor selection determines which low-dimensional representations of system trajectories will capture the most informative aspects of the system. (A) The thirteen state variables along with their first order interactions of the Tyson and Novak model are shown. Each state is colored according to their individual contribution to observability measured by in the synthetic data. , which has the highest average contribution to observability, is boxed in red since it is selected as the first sensor by the greedy sensor selection algorithm. (B) The ratio of singular values of measures and increases with the number of sensors. This shows that when few sensors are used the smaller singular values are insignificant and the observability matrix is approximately low rank. (C) The effective rank of is shown when pairs of state variables are included in the sensor set. After is selected by the first iteration of the greedy algorithm, is the next best choice to maximize . (D) The observability is shown over multiple iterations of the greedy algorithm. At each iteration, the observability increases and the contribution of the next sensor diminishes.
Data Driven Biological Models: We apply techniques from Dynamic Mode Decomposition (DMD) and Data-Guided Control (DGC) to construct time-dependent models of gene expression.
Observability Analysis: We present several measures of observability (, table 1) and provide optimization strategies. Dynamic Sensor Selection (DSS) methods are developed to reallocate sensors and optimize observability throughout time.
Biological Validation: Observability-guided biomarkers are validated against established biological knowledge. Additionally, we incorporate chromosome conformation and other biological data as constraints to refine the observability analysis, ensuring alignment with biological priors.
Table 1: Observability Measures.
A comparison of five observability criteria, highlighting their condition (graded or binary), applicable dynamics, and compatibility with dynamic sensor selection. Algebraic observability is not suitable for time varying systems because the corresponding differential algebraic conditions require the system and its parameters remain constant [51]. Structural observability, which stems from structural controllability, represents the first order interactions (i.e. the Jacobian of ) as a network. While the observability of this network indicates a linearization of the corresponding nonlinear system is observable, structural observability of the network does not guarantee the observability of the nonlinear systems [15]. DSS is designed for nonlinear biological systems whose dynamics and sensors can be reallocated throughout time, which excludes the further consideration of and
Our work provides insights into the relationship between observability and the monitoring of biological systems, along-side the development of methods for DSS. We introduce approaches for DSS and propose strategies to integrate gene expression (RNA-seq) and chromosome structure (Hi-C) data within the observability framework. We demonstrate the utility and versatility of our framework across multiple datasets from genomics and beyond.
2. Results
The rank of the observability matrix is a traditional criterion used to assess observability. A system is observable when . For the pair (, ) the observability matrix is:
| (3) |
Here, denotes the Lie derivative [4]. When are nonlinear, the matrix depends on a particular state vector , meaning observability is a local property that determines whether the system is locally observable at [22]. When the system is linear time-invariant (LTI), i.e. and where and are matrices, the observability matrix is , and the rank criterion is the famous Kalman condition, which establishes a global observability property for all [24].
The application of rank-based criteria is impractical for high-dimensional systems with imperfect models of dynamics, as is often the case with biological systems. Take, for example, the DNA replication model in fission yeast proposed by Novak and Tyson (fig. 1A)) [40]. Their model comprises a differential equation with twelve state variables representing gene expression and one representing cell mass, forming a system that becomes observable if any gene is monitored (supplementary information eq. (14)) [35]. Yet, in practice, due to poor conditioning the observability matrix is approximately low rank. To demonstrate this, we used synthetic data to construct for one thousand randomly selected state vectors , testing each configuration where a single variable serves as the sensor. Although is full rank when using symbolic calculations, the singular values of reveal that these matrices are only effectively low rank across all thousand simulated data points (supplementary information fig. 4). The simulated data shows that poor conditioning of the observability matrix — characterized by the ratio being extremely small for — gives the appearance that , as if the system is locally unobservable at all sampled state vectors when only one sensor is utilize (fig. 1B). To address the practical concerns of rank based observability tests, a range of graded observability measures have been developed.
Observability Metrics.
Beyond the Kalman condition, several metrics to quantify observability have been proposed [41]. The Kalman condition, for instance, can be relaxed to measure the rank of the observability matrix:
When the system is not fully observable, i.e. , the number of directions or principal components of the system that can be observed is given by [39].
The observability Gramian is an alternative matrix to quantify observability. For LTI systems, the observability Gramian is defined as the solution to the Lyaponov equations:
This matrix facilitates the calculation of two additional observability metrics:
The energy metric reflects the amplitude of the measured data , and the visibility metric is like an average measure of observability for each direction in the state space [12].
The ability to compute the observability Gramian and the observability matrix using various algorithms makes , , and robust metrics well-suited for DSS. Additional observability metrics include: (1) structural observability, which is favored for its scalability [31], (2) algebraic observability, which is applicable to nonlinear systems [51, 35], and (3) additional Gramian based metrics [41, 12] (table 1).
Sensor Selection Problem.
In order to maximize observability, the biomarker or sensor selection problem is formulated as:
| (4) |
Common constraints include a budget or inability to measure certain variables. Since each variable may or may not be measured, there are candidate solutions to the sensor selection problem making the optimization challenging for high dimensional biological systems where is large.
To optimize observability of the Novak and Tyson model, we applied a greedy sensor selection algorithm (supplementary information algorithm 1). The greedy approach selects sensors iteratively, at each step selecting the candidate sensor that maximize , averaged across all instances in the synthetic dataset. was chosen first because it provided the highest rank of (fig. 1A). With selected, the algorithm then evaluated the observability when monitoring and each remaining state variable, selecting as the next sensor (fig. 1C). This process was repeated until is maximized at 13, when monitoring the five variables: , , , , and mass.
When there is no limit on how much data can be measured in a simulation or experiment, a greedy algorithm finds the optimal solution to the sensor selection problem of eq. (4). As the greedy algorithm iteratively selects sensors, the system’s observability improves with each step. This is reflected in both the ratio between singular values (fig. 1B) and the average effective rank of (fig. 1D). However, the marginal utility of each additional sensor decreases as fewer unobserved directions remain, reflecting diminishing returns in observability provided by each additional sensor. The poor conditioning of and the diminishing return from from the use of additional sensors leads us to our first result: practical application requires consideration of data quality, local system states, and resource constraints to ensure the effective use of biomarkers that theoretically make a system observable.
Biomarker Observability Depends on Biological State.
Biomarker selection has traditionally relied on domain expertise to identify markers associated with specific biological states [44]. For example, the PIP-FUCCI system, a live cell microscopy approach for determining cell cycle stages (G1, S, G2, M), was developed based on prior knowledge of cell cycle dynamics and leverages biomarkers that vary with cell state [17, 68]. Imaged expression data from three genes reveals cell cycle stages and transitions: CDT1 (G1), PCNA (S), and GEM (S/G2/M). Because cells progress through the cycle at different rates — affected by cell type, experimental conditions, and cell-to-cell variability — the position within the cell cycle determines the relevance of the PIP-FUCCI biomarkers at different points in time [13, 56].
As cells progress through the cell cycle, they may stall in the G1 phase, often referred to as G0, entering a state known as quiescence (fig. 2A). During quiescence, the cell ceases to divide, much like a system reaching a stable equilibrium. Quiescent cancer cell are linked to high cancer recurrence, as these cells can re-enter the cell cycle, and the reduced cell cycle activity diminishes the effectiveness of chemo- and immunotherapies [32, 30, 9, 29]. The proliferation-quiesence bifurcation is described mathematical as a transition between stability and periodicity in dynamical systems and coincides with a shift in observability [45]. Smale’s two-cell system illustrates this concept, exhibiting stable solutions that correspond to quiescence and periodic solutions that resemble progression through the cell cycle [53]. As a special case of Turing’s equations of morphogenesis, these dynamics have been characterized as “mathematically dead ” when stable and “mathematically alive” when oscillatory [59, 11].
Figure 2: State-Dependent Observability.
(A) A cell’s progression through the cell cycle—whether transitioning through phases during proliferation or stalling in G1/G0 during quiescence—is mediated by CDK2 activity [55]. (B) The Andronov-Hopf oscillator demonstrates either asymptotically stable or periodic limit cycle behavior, depending on the parameter . (C) The transition from stable to periodic behavior in the Andronov-Hopf oscillator coincides with an increase in observability. Initial conditions used to construct the empirical observability Gramians were selected by sampling and from uniform distributions bounded by ±1, ±2, and ±4.
Systems that are mathematically alive exhibit heightened observability. The Hopf bifurcation, the archetypal example of the transition between stable and periodic dynamics [37], is observed in the Andronov-Hopf oscillator:
| (5) |
As the parameter transitions from negative to positive values (fig. 2B), the system shifts from being mathematically dead to alive! To assess how observability transitions between the mathematically dead and alive states, we select as a sensor and measure the output with . With the dynamics of the Andronov-Hopf oscillator and fixed sensor, empirical observability Gramians were constructed from simulated data with both dead and alive choices of [23]. Empirical observability Gramians are constructed from simulations and perturbations of the system eq. (5), as opposed to solving the Lyapunov equations or other approximation techniques (supplementary information §4.2). From the observability Gramian for each simulation, measured the observability of the system (fig. 2C). The degree of observability of this system is primarily controlled by the bifurcation parameter and modulated by the choice of initial condition for the simulation. Variance in observability diminishes for because simulated trajectories are driven toward similar limit cycles or cyclic patterns. In contrast, when the system is stable and , observability varies significantly between simulations with different initial conditions. This occurs because the starting point becomes the key differentiator for how quickly trajectories converge to the stable point. In other words, mathematically alive systems are more observable because the periodic behavior causes the system state cover a wider range of the state space, which increases the measurable information in the sensor data.
The variation in observability governed by serves as a mathematical analog to insights from PIP-FUCCI biomarkers: the observability provided by sensors is not constant. Although the gene CDT1 is a biomarker for the G1 phase, its contribution to observability diminishes during other cell cycle phases, necessitating the use of PCNA and GEM to delineate each phase. Similarly, increases, the contribution of the sensor to system observability increases by a factor of 103. The highest variability in observability occurs during the transition between periodic and stable behaviors of the system, where is near 0, at the bifurcation occurs. Thus, the utility of measuring is least certain at the point where the system is nearest to transitioning between stable and periodic behaviors. This observation leads us to our second main result: the dependence of biomarkers on biological state requires time-dependent biomarker selection.
Dynamic Sensor Selection (DSS).
To address the need to identify and allocate biomarkers over shifts in the underlying dynamics, we developed DSS to identify time varying biomarkers based upon models of dynamics learned from time series data genomics. Due to the high dimensionality and relatively few time points found in genomics data, this approach is tailored to discrete time LTI and linear time-varying (LTV) models of dynamics that can be learned from time series transcriptomics data (supplementary information §4.3). After learning a model of dynamics , DSS adapts the sensor selection problem (eq. (4)) to select sensors for each point in time:
| (6) |
The selected sensors are placed in a measurement matrix that varies with time. While time-dependent systems have been studied in the context of controllability and robustness [52, 49], sensor selection for time-dependent systems has only received limited attention in the literature.
To optimize eq. (6) for , , and , the time-dependent observability Gramian is required. The observability Gramian of a LTV system from time to time is:
| (7) |
where is the time-dependent state transition matrix from to , given by:
Maximization of is achieved by solving the eigen-value problem:
| (8) |
where is a Gram matrix learned from data as
The columns of in eq. (8) correspond to optimal sensor placement at time and the contribution to observability is weighted by the eigenvalues found in .
Measure can be maximized with a linear program. The LTV observability Gramian in eq. (7) can be expressed equivalently as:
| (9) |
where , indicates 1 if variable is measured at time and 0 otherwise, and is a row vector with the th entry as 1 and 0 otherwise. Because the matrix trace is linear, when is relaxed to a continuous value, i.e. , a linear program can solve the optimization:
| (10) |
Constraints can be incorporated into this optimization, such as restricting the number of sensors at time with the constraint: . See supplementary information §5 for precise details on these optimizations. In the following sections, we demonstrate a range of applications for the DSS methodology in biological systems.
Estimating Unmeasured Genes.
To evaluate DSS, we identified genes critical to observing the dynamics of Pseudomonas fluorescens SBW25, a bacterium used for insect control, from data collected by Hasnain et al. [20]. The dataset comprises time series spanning nine time points and 600 genes, representing a high-dimensional system that presents challenges for observation. Gene regulation in the bacteria or other cells is represented by eq. (1), where denotes the expression of the th gene, and is the gene regulatory network at time . The matrix element specifies the influence of gene on gene at time point . We selected two sets of sensors that (1) optimize over time with DSS and (2) optimize with fixed sensors throughout the experiment.
We found that DSS improves state estimation relative to using fixed sensors. To test estimation capabilities, after selecting biomarkers, the data is divided into observable biomarker and unobservable non-biomarker datasets. Estimating the non-biomarker gene expression can then be formulated as solving the following least squares problem:
| (11) |
where is an efficient estimator of [54]. The biomarker data is assembled in a matrix The estimation of unmeasured genes in eq. (11) has the solution , where denotes the pseudoinverse [3]. To measure the quality of the estimation of , it is compared to the true data of . This error can be measured using several metrics, e.g. , but the component-wise error is most relevant to assessing the error of estimating individual genes. DSS consistently improved the median estimation error for each of the 600 genes, regardless of the number of sensors used (fig. 3A). The median estimation error measures the ability to estimate the expression values of individual genes. In spite of the biomarkers and system not satisfying the Kalman rank condition or the Popov-Belevitch-Hautus test, the incorporation of time-varying dynamics and sensors enables unmeasured genes to be estimated with an error within 50%, a level of accuracy that is practically useful for many real-world applications.
Figure 3: Biomarker Selection from Time Series Data.
(A) DSS improves the estimation error of individual genes from biomarker data relative to the use of biomarkers that are fixed throughout time. (B) Constraining the sensor selection problem with Hi-C positions highly observable biomarker genes on chromosomes to more closely reflect the spatial distribution of genes within the nucleus, as indicated by the gray background. The positions of the top 10% of biomarkers selected with unconstrained DSS, DSS constrained by Hi-C data, and biomarkers common to both methods are shown in pink, green, and blue, respectively. (C) The time series neuron activity was collected for 10 minute segments on three consecutive days. The recorded activity extracted from twenty neurons is shown, with the activity of three neurons highlighted in red, green, and blue. (D) Throughout the three day period, the observability contributed by each neuron varies greatly. The neuron indicated in red, which initially is the worst sensors, becomes the most observable as its overall activity becomes the largest in day 3. (E) The spatial position of 64 EEG leads colored by their contribution to observability. (F) The signals from each of the 64 EEG leads are ranked based on their observability, with the average rank representing each sensor’s mean ranking across all six tasks.
Functional Observers for Cellular Reprogramming.
While DSS enhances state estimation of unmeasured missing gene expression values, many biomedical applications rely on biomarkers to indicate phenotypes or cell types. Early detection of cellular reprogramming, a process that transforms cell type and induces a shift in the dynamics of the cell’s transcriptional program, is an important and unresolved challenge in biomanufacturing [66]. This task falls under the framework of functional observability, where the goal is to select biomarkers or sensors that enable the estimation of specific modes of the unmeasured states — such as phenotype — without reconstruction of all unmeasured state variables or genes [58].
A system is functionally observable for the modes defined by the rows of the matrix if
| (12) |
Here, is a matrix where each row represents a functionally observable mode or direction of the system. When is sparse, with only one nonzero entry per row, individual states can be estimated — a property known as targeted observability [38, 67]. In contrast, when the rows of are dense, each row can represent a cell type, and each element within a row can correspond to the expression of a particular gene for that cell type, enabling the system to be functionally observable for a biological state.
A system is always functionally observable for the principal components (or right singular vectors) of , which are utilized in the estimation framework of eq. (11) [39]. By applying the Singular Value Decomposition (SVD) to the observability matrix , the pseudoinverse is expressed as . Consequently, the state estimation of is computed as:
| (13) |
This equation demonstrates that the state estimate is a linear combination of the right singular vectors weighted by the contributions of the nonzero singular values and the data . Since the rows of associated with nonzero singular values are in the row space of , the system is functionally observable for the modes described by .
With time series gene expression data from a recreation of Weintraub’s seminal 1989 reprogramming experiment, we built a functional observer for the reprogramming of fibroblasts to skeletal muscle [64]. To monitor the progression of myogenic reprogramming initiated by the introduction of the transcription factor MYOD, bulk RNA-seq data was collected at 8-hour intervals [33]. Cellular reprogramming remains characterized by partial reprogramming and low efficiency, which result in weak and noisy signals. To address this, we developed two LTV models of gene expression dynamics to amplify the reprogramming signal and facilitate observer construction. The first model (Model 1) encompassed all 19,235 genes measured during the experiment, while the second model (Model 2) focused on 406 genes involved in cell cycle regulation and myogenic lineages, aiming to enhance the weak reprogramming signal. To identify biomarkers, we optimized on Model 2 to identify which reprogramming genes most strongly contribute to system observability. The top ranked reprogramming biomarkers identified by Model 2 were used to monitor Model 1.
To identify the functionally observable modes of Model 1 with the selected biomarkers, we constructed the observability matrix of Model 1, performed the SVD , and considered the rows of associated with nonzero singular values. Each row of is a functionally observable mode, and the entries of a row in indicate the expression of different genes. We performed cell type and functional enrichment to determine the biological processes that could be observed associated with each mode. This revealed cell types such as fibroblasts, myofibroblasts, and myoblasts, all known to be involved in myogenic reprogramming (supplemental information fig. 5). Similarly, there is a strong preference for myogenic and cell cycle genes, both involved in reprogramming, to be heavily weighted in the functionally observable modes. Further enrichment analysis highlights cellular activities like the defense response to viruses (GO:0051607), aligning with the expected response due to Lentiviral reprogramming, and regulation of the cell cycle (GO:0051726) and smooth muscle cell proliferation (GO:0048661), consistent with the cell division and differentiation that occurred in these data (supplemental information fig. 6). The enrichment of functionally observable modes corresponding to biological states and processes consistent with the experiment suggests that, despite the low reprogramming signal and substantial noise, biomarkers identified using DSS are well-suited for monitoring cellular reprogramming.
Chromatin Informed Biomarkers.
Integrating biological insights or domain knowledge that is not captured in the system model or state space into the observability-guided biomarker selection framework helps align the sensor selection problem with practical biological considerations. For instance, monitoring multiple biomarkers within a transcription factory – a group of genes that is colocalized in the nucleus where the genes are often coregulated – may provide redundant information [46, 8]. Modifying the sensor selection problem to limit the number of biomarkers per transcription factory or satisfy other requirements is achieved by modifying the constraints of eq. (6).
To select biomarkers in the context of transcription factors, we use Hi-C data, which provides information on genome structure, to constrain eq. (4). With Hi-C data from the study by [33], we generated gene-by-gene Hi-C matrices representing observed contact frequencies between genes (supplementary information fig. 7). From the genecentric Hi-C matrix, we performed hierarchical clustering and optimized the silhouette score to construct gene clusters where each cluster represents a group of genes that are likely proximal to one another (supplementary information fig. 8). Then, we constrained eq. (4) to prevent the simultaneous selection of multiple genes found within the same cluster:
To maximize , we applied a greedy heuristic by first solving the unconstrained maximization, then selecting the top ranked sensors that meet the constraints. For , these constraints from Hi-C data can directly be incorporated into the linear program used to maximize eq. (10).
Although this constrained optimization yields a lower utility of the objective function , the selected biomarkers have two practical advantages. First, they are distributed across the genome in a pattern that mirrors natural gene placement across chromosomes (fig. 3B). Second, their performance is comparable to that of biomarkers selected from the unconstrained dataset. Integrating chromatin-informed constraints into the biomarker selection process, ensures observability maximization in the context of and consistent with prior biological knowledge.
Beyond the Genome.
To demonstrate the utility of observability guided biomarkers beyond genomics, we applied DSS to in vivo single-cell endomicroscopic signals collected by [57]. We constructed a LTI model for each experimental phase — feeding, fasting, and refeeding — and identified the contribution to observability (fig. 3C). Our results revealed substantial shifts in monitoring utility across neurons. Notably, the neuron with the lowest initial output energy exhibited the highest contribution to observability during refeeding (fig. 3D). A similar pattern is confirmed for . The varying contributions to observability—where some neurons consistently act as good sensors, others are transiently effective only during feeding states, and some gain significant observability contributions after fasting—underscore the importance of dynamically selecting sensors.
At a larger scale, we applied DSS to assess the observability of electroencephalogram (EEG) signals [50]. Brain activity, sampled from 64 EEG leads at 160 Hz across a cohort of over 100 patients, provides high-quality time series data well suited for modeling and observability analysis [42, 18]. We evaluated the observability contribution of each of the 64 EEG leads as participants transitioned through various states, including open-eye and closed-eye conditions and four tasks involving hand and foot movements (fig. 3E). Notably, sensor performance varied significantly, particularly between the open-eye and closed-eye conditions (fig. 3F, supplementary information fig. 9). These findings highlight the applicability of DSS and observability-guided biomarkers to a range of critical applications and data modalities.
3. Discussion
In this work we extended the tools of observability theory to identify biomarkers, accounting for the practicalities and constraints of experimental biological data. Our key findings include (1) that the contribution to observability of biomarkers depends on the biological state. This work establishes a connection between the state-dependent utility of biomarkers and the concepts of local and time-dependent observability. Moreover, (2) we developed DSS as a mechanism to optimize observability as the dynamics of the underlying system change. The development of DSS provides a computational approach for sensor selection in time-varying systems, a concept long recognized in observability theory but only recently made feasible for genomic biomarkers due to advances in measurement technologies.
Our application to real biomedical data demonstrates both the versatility of this approach and highlights practicalities of observability that are often overlooked in theoretical discussions and the absence of real data. In particular, this work relaxes the need for mathematical modeling of biological systems by leveraging data-guided modeling, enabling the detection of biomarkers through observability analysis in any biomedical time series data. This also highlights the relevance of ensuring theoretical criteria can be validated numerically, within real or synthetic data.
This work also raises several research directions worthy of future pursuit. First, while measured data can determine the unmeasured states of an observable system, there are many ways and algorithms to perform such an estimation. Here, we have used the most basic approach, performing a least squares estimation; this leaves open several avenues for the design and construction of observers tailored to the unique noise, sparsity, and destructive nature of transcriptomics assays. The development of observers is important for realizing the utility of observability-guided biomarkers with emerging sequencing technologies. Second, the state space representation of a cell is crucial for determining observability criteria [5]. In this work, we adopted the raw data as the state space, but this may not be the most optimal approach. Future research could explore enhanced representations of a cell that integrate both genomic structure and function by incorporating additional data modalities. Third, theoretical investigations into bifurcation observability, particularly in the context of Smale’s two-cell system, are closely related to the bifurcation control problem and warrant further attention [7]. This areas, and others, must be further developed to leverage the rapid development of emerging experimental and computational biotechnologies.
Supplementary Material
Acknowledgments.
We thank the members of the Rajapakse Lab for helpful and inspiring discussions. This work is supported by Air Force Office of Scientific Research (AFOSR) under award number FA9550–22-1–0215 (IR), FA9550–23-1–0400 (AB), NSF DMS-2103026 (AB) and NIGMS GM150581 (JP).
References
- [1].Anguelova Milena. Nonlinear Observability and Identi ability: General Theory and a Case Study of a Kinetic Model for S. cerevisiae. Chalmers Tekniska Hogskola (Sweden), 2004. [Google Scholar]
- [2].Anguelova Milena. Observability and identifiability of nonlinear systems with applications in biology. Chalmers Tekniska Hogskola (Sweden), 2007. [Google Scholar]
- [3].Barata João Carlos Alves and Hussein Mahir Saleh. The moore–penrose pseudoinverse: A tutorial review of the theory. Brazilian Journal of Physics, 42:146–165, 2012. [Google Scholar]
- [4].Bloch Anthony M. Nonholonomic mechanics. Springer, 2015. [Google Scholar]
- [5].Bunne Charlotte, Roohani Yusuf, Rosen Yanay, Gupta Ankit, Zhang Xikun, Roed Marcel, Alexandrov Theo, AlQuraishi Mohammed, Brennan Patricia, Burkhardt Daniel B, et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities. Cell, 187(25):7045–7063, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Califf Robert M. Biomarker definitions and their applications. Experimental Biology and Medicine, 243(3):213–221, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Chen Guanrong, Moiola Jorge L, and Wang Hua O. Bifurcation control: theories, methods, and applications. International Journal of Bifurcation and Chaos, 10(03):511–548, 2000. [Google Scholar]
- [8].Chen Haiming, Chen Jie, Muir Lindsey A, Ronquist Scott, Meixner Walter, Ljungman Mats, Ried Thomas, Smale Stephen, and Rajapakse Indika. Functional organization of the human 4d nucleome. Proceedings of the National Academy of Sciences, 112(26):8002–8007, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Chen Wanyin, Dong Jihu, Haiech Jacques, Kilhoffer Marie-Claude, and Zeniou Maria. Cancer stem cell quiescence and plasticity as major challenges in cancer therapy. Stem cells international, 2016(1):1740936, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Chen Wanze, Guillaume-Gentil Orane, Pernille Yde Rainer, Gäbelein Christoph G, Saelens Wouter, Gardeux Vincent, Klaeger Amanda, Dainese Riccardo, Zachara Magda, Zambelli Tomaso, et al. Live-seq enables temporal transcriptomic recording of single cells. Nature, 608(7924):733–740, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Chua Leon O. Local activity is the origin of complexity. International journal of bifurcation and chaos, 15(11):3435–3456, 2005. [Google Scholar]
- [12].Cortesi Fabrizio L, Summers Tyler H, and Lygeros John. Submodularity of energy related controllability metrics. In 53rd IEEE conference on decision and control, pages 2883–2888. IEEE, 2014. [Google Scholar]
- [13].Z Darzynkiewicz H Traganos Crissman, F, and Steinkamp J. Cell heterogeneity during the cell cycle. Journal of cellular physiology, 113(3):465–474, 1982. [DOI] [PubMed] [Google Scholar]
- [14].Del Vecchio Domitilla and Murray Richard M. Biomolecular feedback systems. Princeton University Press; Princeton, NJ, 2015. [Google Scholar]
- [15].Diop Sette and Fliess Michel. Nonlinear observability, identifiability, and persistent trajectories. In [1991] Proceedings of the 30th IEEE Conference on Decision and Control, pages 714–719. IEEE, 1991. [Google Scholar]
- [16].Gavish Matan and Donoho David L. The optimal hard threshold for singular values is 4/3. IEEE Transactions on Information Theory, 60(8):5040–5053, 2014. [Google Scholar]
- [17].Grant Gavin D, Kedziora Katarzyna M, Limas Juanita C, Cook Jeanette Gowen, and Purvis Jeremy E. Accurate delineation of cell cycle phase transitions in living cells with pip-fucci. Cell Cycle, 17(21–22):2496–2516, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Gupta Gaurav, Pequito Sérgio, and Bogdan Paul. Learning latent fractional dynamics with unknown unknowns. In 2019 American Control Conference (ACC), pages 217–222. IEEE, 2019. [Google Scholar]
- [19].Hartwell Lee, Mankoff David, Paulovich Amanda, Ramsey Scott, and Swisher Elizabeth. Cancer biomarkers: a systems approach. Nature biotechnology, 24(8):905–908, 2006. [DOI] [PubMed] [Google Scholar]
- [20].Hasnain Aqib, Balakrishnan Shara, Joshy Dennis M, Smith Jen, Haase Steven B, and Yeung Enoch. Learning perturbation-inducible cell states from observability analysis of transcriptome dynamics. Nature Communications, 14(1):3148, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Hassard Brian D, Kazarinoff Nicholas D, and Wan Yieh-Hei. Theory and applications of Hopf bifurcation, volume 41. CUP Archive, 1981. [Google Scholar]
- [22].Hermann Robert and Krener Arthur. Nonlinear controllability and observability. IEEE Transactions on automatic control, 22(5):728–740, 1977. [Google Scholar]
- [23].Himpe Christian. emgr—the empirical gramian framework. Algorithms, 11(7):91, 2018. [Google Scholar]
- [24].Kalman Rudolf Emil. Mathematical description of linear dynamical systems. Journal of the Society for Industrial and Applied Mathematics, Series A: Control, 1(2):152–192, 1963. [Google Scholar]
- [25].Kazma Mohamad H and Taha Ahmad F. Observability for nonlinear systems: Connecting variational dynamics, lyapunov exponents, and empirical gramians. arXiv preprint arXiv:2402.14711, 2024. [Google Scholar]
- [26].Konstantinides Konstantinos and Yao Kung. Statistical analysis of effective singular values in matrix rank determination. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(5):757–763, 1988. [DOI] [PubMed] [Google Scholar]
- [27].Kutz J Nathan, Brunton Steven L, Brunton Bingni W, and Proctor Joshua L. Dynamic mode decomposition: data-driven modeling of complex systems. SIAM, 2016. [Google Scholar]
- [28].Lall Sanjay, Marsden Jerrold E, and Glavaški Sonja. Empirical model reduction of controlled nonlinear systems. IFAC Proceedings Volumes, 32(2):2598–2603, 1999. [Google Scholar]
- [29].Lee Sau Har, Reed-Newman Tamika, Anant Shrikant, and Ramasamy Thamil Selvee. Regulatory role of quiescence in the biological function of cancer stem cells. Stem Cell Reviews and Reports, 16:1185–1207, 2020. [DOI] [PubMed] [Google Scholar]
- [30].Li Ling and Bhatia Ravi. Stem cell quiescence. Clinical cancer research, 17(15):4936–4941, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Lin Ching-Tai. Structural controllability. IEEE Transactions on Automatic Control, 19(3):201–208, 1974. [Google Scholar]
- [32].Lindell Emma, Zhong Lei, and Zhang Xiaonan. Quiescent cancer cells—a potential therapeutic target to overcome tumor resistance and relapse. International Journal of Molecular Sciences, 24(4):3762, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Liu Sijia, Chen Haiming, Ronquist Scott, Seaman Laura, Ceglia Nicholas, Meixner Walter, Chen Pin-Yu, Higgins Gerald, Baldi Pierre, Smale Steve, et al. Genome architecture mediates transcriptional control of human myogenic reprogramming. Iscience, 6:232–246, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Liu Yang-Yu, Slotine Jean-Jacques, and Barabási Albert-László. Controllability of complex networks. nature, 473(7346):167–173, 2011. [DOI] [PubMed] [Google Scholar]
- [35].Liu Yang-Yu, Slotine Jean-Jacques, and Barabási Albert-László. Observability of complex systems. Proceedings of the National Academy of Sciences, 110(7):2460–2465, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Marsden JE, McCracken M, and Smale S. A mathematical model of two cells via turing’s equation. The Hopf bifurcation and its applications, pages 354–367, 1976. [Google Scholar]
- [37].Marsden Jerrold E and McCracken Marjorie. The Hopf bifurcation and its applications, volume 19. Springer Science & Business Media, 2012. [Google Scholar]
- [38].Montanari Arthur N, Duan Chao, Aguirre Luis A, and Motter Adilson E. Functional observability and target state estimation in large-scale networks. Proceedings of the National Academy of Sciences, 119(1):e2113750119, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Moore Bruce. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE transactions on automatic control, 26(1):17–32, 1981. [Google Scholar]
- [40].Novak Bela and Tyson John J. Modeling the control of dna replication in fission yeast. Proceedings of the National Academy of Sciences, 94(17):9147–9152, 1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Pasqualetti Fabio, Zampieri Sandro, and Bullo Francesco. Controllability metrics, limitations and algorithms for complex networks. IEEE Transactions on Control of Network Systems, 1(1):40–52, 2014. [Google Scholar]
- [42].Pequito Sérgio, Bogdan Paul, and Pappas George J. Minimum number of probes for brain dynamics observability. In 2015 54th IEEE Conference on Decision and Control (CDC), pages 306–311. IEEE, 2015. [Google Scholar]
- [43].Poincaré Henri. Les méthodes nouvelles de la mécanique céleste, volume 2. Gauthier-Villars et fils, imprimeurs-libraires, 1893. [Google Scholar]
- [44].Ray Patrick, Manach Yannick Le, Riou Bruno, Houle Tim T, and Warner David S. Statistical evaluation of a biomarker. The Journal of the American Society of Anesthesiologists, 112(4):1023–1040, 2010. [DOI] [PubMed] [Google Scholar]
- [45].Riba Andrea, Oravecz Attila, Durik Matej, Jiménez Sara, Alunni Violaine, Cerciat Marie, Jung Matthieu, Keime Céline, Keyes William M, and Molina Nacho. Cell cycle gene regulation dynamics revealed by rna velocity and deep-learning. Nature communications, 13(1):2865, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Rieder Dietmar, Trajanoski Zlatko, and McNally James G. Transcription factories. Frontiers in genetics, 3:221, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Ronquist Scott, Patterson Geoff, Muir Lindsey A, Lindsly Stephen, Chen Haiming, Brown Markus, Wicha Max S, Bloch Anthony, Brockett Roger, and Rajapakse Indika. Algorithm for cellular reprogramming. Proceedings of the National Academy of Sciences, 114(45):11832–11837, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Roy Olivier and Vetterli Martin. The effective rank: A measure of effective dimensionality. In 2007 15th European signal processing conference, pages 606–610. IEEE, 2007. [Google Scholar]
- [49].Sastry Shankar and Desoer C. The robustness of controllability and observability of linear time-varying systems. IEEE Transactions on Automatic Control, 27(4):933–939, 1982. [Google Scholar]
- [50].Schalk Gerwin, McFarland Dennis J, Hinterberger Thilo, Birbaumer Niels, and Wolpaw Jonathan R. Bci2000: a general-purpose brain-computer interface (bci) system. IEEE Transactions on biomedical engineering, 51(6):1034–1043, 2004. [DOI] [PubMed] [Google Scholar]
- [51].Sedoglavic Alexandre. A probabilistic algorithm to test local algebraic observability in polynomial time. In Proceedings of the 2001 international symposium on Symbolic and algebraic computation, pages 309–317, 2001. [Google Scholar]
- [52].Silverman Leonard M and Meadows HE. Controllability and observability in time-variable linear systems. SIAM Journal on Control, 5(1):64–73, 1967. [Google Scholar]
- [53].Smale S. A mathematical model of two cells via turing’s equation. The Hopf bifurcation and its applications, pages 354–367, 1976. [Google Scholar]
- [54].Sorenson Harold W. Least-squares estimation: from gauss to kalman. IEEE spectrum, 7(7):63–68, 1970. [Google Scholar]
- [55].Spencer Sabrina L, Cappell Steven D, Tsai Feng-Chiao, Overton K Wesley, Wang Clifford L, and Meyer Tobias. The proliferation-quiescence decision is controlled by a bifurcation in cdk2 activity at mitotic exit. Cell, 155(2):369–383, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Spiller David G, Wood Christopher D, Rand David A, and White Michael RH. Measurement of single-cell dynamics. Nature, 465(7299):736–745, 2010. [DOI] [PubMed] [Google Scholar]
- [57].Sweeney Patrick, Chen Can, Rajapakse Indika, and Cone Roger D. Network dynamics of hypothalamic feeding neurons. Proceedings of the National Academy of Sciences, 118(14), 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Trinh Hieu and Fernando Tyrone. Functional observers for dynamical systems, volume 420. Springer Science & Business Media, 2011. [Google Scholar]
- [59].Turing Alan Mathison. The chemical basis of morphogenesis . Bulletin of mathematical biology, 52:153–197, 1952. [DOI] [PubMed] [Google Scholar]
- [60].Udell Madeleine and Townsend Alex. Why are big data matrices approximately low rank? SIAM Journal on Mathematics of Data Science, 1(1):144–160, 2019. [Google Scholar]
- [61].Villaverde Alejandro F et al. Observability and structural identifiability of nonlinear biological systems. Complexity, 2019, 2019. [Google Scholar]
- [62].Villaverde Alejandro F, Tsiantis Nikolaos, and Banga Julio R. Full observability and estimation of unknown inputs, states and parameters of nonlinear biological models. Journal of the Royal Society Interface, 16(156):20190043, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [63].Weilguny Lukas, De Maio Nicola, Munro Rory, Manser Charlotte, Birney Ewan, Loose Matthew, and Goldman Nick. Dynamic, adaptive sampling during nanopore sequencing using bayesian experimental design. Nature Biotechnology, 41(7):1018–1025, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [64].Weintraub Harold, Tapscott Stephen J, Davis Robert L, Thayer Mathew J, Adam Mohammed A, Lassar Andrew B, and Miller A Dusty. Activation of muscle-specific genes in pigment, nerve, fat, liver, and fibroblast cell lines by forced expression of myod. Proceedings of the National Academy of Sciences, 86(14):5434–5438, 1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [65].Xie Zhuorui, Bailey Allison, Kuleshov Maxim V, Clarke Daniel JB, Evangelista John E, Jenkins Sherry L, Lachmann Alexander, Wojciechowicz Megan L, Kropiwnicki Eryk, Jagodnik Kathleen M, et al. Gene set knowledge discovery with enrichr. Current protocols, 1(3):e90, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Yamanaka Shinya. Elite and stochastic models for induced pluripotent stem cell generation. Nature, 460(7251):49–52, 2009. [DOI] [PubMed] [Google Scholar]
- [67].Zhang Yuan, Fernando Tyrone, and Darouach Mohamed. Functional observability, structural functional observability and optimal sensor placement. IEEE Transactions on Automatic Control, 2024. [Google Scholar]
- [68].Zielke N and Edgar BA. Fucci sensors: powerful new tools for analysis of cell proliferation. Wiley Interdisciplinary Reviews: Developmental Biology, 4(5):469–487, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



