Summary:
With recent advances in technologies to profile multi-omics data at the single-cell level, integrative multi-omics data analysis has been increasingly popular. It is increasingly common that information such as methylation changes, chromatin accessibility, and gene expression are jointly collected in a single-cell experiment. In biomedical studies, it is often of interest to study the associations between various data types and to examine how these associations might change according to other factors such as cell types and gene regulatory components. However, since each data type usually has a distinct marginal distribution, joint analysis of these changes of associations using multi-omics data is statistically challenging. In this paper, we propose a flexible copula-based framework to model covariate-dependent correlation structures independent of their marginals. In addition, the proposed approach could jointly combine a wide variety of univariate marginal distributions, either discrete or continuous, including the class of zero-inflated distributions. The performance of the proposed framework is demonstrated through a series of simulation studies. Finally, it is applied to a set of experimental data to investigate the dynamic relationship between single-cell RNA-sequencing, chromatin accessibility, and DNA methylation at different germ layers during mouse gastrulation.
Keywords: Dynamic association, Gaussian copula regression, Integrative multi-omics data analysis, Liquid association, Single-cell experiment, Zero-inflated model
1. Introduction
Recent advances in high-throughput technologies have enabled the profiling of different molecular layers in living cells. It is increasingly common to collect multi-omics data, such as genome, epigenome, transcriptome, and proteome in single-cell studies. Oftentimes these data are of different modalities and contain information complement to each other. Analyzing them together may bring insights that cannot be revealed by analyzing each data type separately. The recent work of Lee et al. (2020) provides a comprehensive review on the multi-omics studies at the single-cell resolution, including the technologies and data analysis methods. In the article, the authors listed three major directions of statistical analysis on single-cell multi-omics data: (1) correlation analysis of various data modalities, for example, between DNA methylation and RNA expression (Hu et al., 2016); (2) analyzing each -omics layer separately then combining corresponding results (for instance Cao et al., 2018) and (3) integrate all data types to generate an overall map (Argelaguet et al., 2019).
This article aims to follow the first path listed above and to study various sources of associations in multi-omics data from single-cell experiments. Specifically, we intend to model how the association between two types of -omics data can change depending on other data modalities and cell types information. We named this covariate-dependent correlation “dynamic association” in this paper. The statistical challenge for studying dynamic association in this context is that different data types often have distinct marginal distributions. For instance, raw counts of single-cell RNA-sequencing data (scRNA-seq) are usually non-negative integers with zero-inflation. After being normalized, scRNA-seq is transformed into non-negative real numbers. DNA methylation and chromatin accessibility data are often measured as proportions. Due to the characteristics such as zero-inflation, skewness, and proportion/ratio of various data types, the conventional multivariate Gaussian distribution assumption is often not adequate in this setting.
In recent years, there has been increased interest in studying dynamic association (Wang et al., 2017; Kinzy et al., 2019; Ma et al., 2020; Yang and Ho, 2021). In his ground-breaking work, Li (2002) proposed a three-product-moment measure called the liquid association. Li et al. (2004) extended this notion to multivariate Y1 and Y2 and a third variable X, and derived the liquid association of linear combinations of Y1 and linear combinations of Y2 given X. In Ho et al. (2011) and Chen et al. (2011), the authors proposed model-based approaches to study covariate-dependent correlation structures. In both papers, the marginal distributions of Y1 and Y2 given X are both assumed to be Gaussian.
Our motivating data was generated from a single-cell multi-omics profiling study of mouse gastrulation (Argelaguet et al., 2019). At the formation stage of primary germ layers during gastrulation, it is unclear how gene expression, chromatin accessibility, and DNA methylation coordinate with each other. This data provides exciting opportunities to study transcription machinery in embryonic stem cells at various stages via the information contained in the genome, epigenome and transcriptome. In this paper, we set out to study how DNA methylation in genes encoding transcription factors (TFs) modulates the association patterns between scRNA-seq and chromatin accessibility in various germ layers during mouse development stages.
From the statistical perspective, this challenge involves modeling the joint distribution of with different marginal distribution, where both the marginal means and the correlation between Y1 and Y2 are functions of a set of covariates X = x, It is of particular interest to study the dynamic association between Y1 and Y2 and how it changes given different values of X. In the motivating example, Y1 and Y2 may represent scRNA-seq and chromatin accessibility, respectively, and the set of covariates x could include methylation level, gastrulation stage, and germ layer type.
In this article, we propose a general framework for studying the dynamic association between Y1 and Y2 given x where the marginal distributions are not limited to the Gaussian distribution and can be adapted to various univariate distributions. We construct copula-based models to study the joint distribution of (Y1, Y2) and develop a flexible approach to incorporate covariate-dependent correlation structures in the proposed framework. This approach can be adapted to many practical situations where the marginal distributions can be either continuous or discrete, including zero-inflated distributions. An adaptive Markov chain Monte Carlo algorithm is proposed to fit the model and estimate the parameters.
Tracing its origin to Sklar’s Theorem (Sklar, 1973), copula-based models offer a flexible way of constructing multivariate distribution functions by combining univariate marginal distribution via copulas (Nelsen, 2006). While historically fitting copula models has largely been implemented with frequentist approaches, with the advances in Bayesian statistics, a substantial number of literature in recent years have described model fitting from the Bayesian paradigm (e.g. Pitt et al., 2006; Smith and Khaled, 2012). The majority of works on copula regression have been on modeling the marginal means as a function of covariates. In order to study dynamic association using single-cell multi-omics data, we develop a novel framework to incorporate covariate-dependent correlation structures in the analysis.
The rest of this article is organized as follows: Section 2 presents the framework of the copula-based model. It also contains examples for specific marginal distributions that frequently considered in single-cell data analysis. Section 3 presents the details of estimation and model comparison procedures. Section 4 demonstrates the performance of the proposed model through simulation studies. We examine the coverage probability and mean squared error of the parameter estimates and compare the proposed model and a simpler model without dynamic association. Section 5 presents the results of the analysis on the single-cell mouse gastrulation data set. Finally, Section 6 concludes the article with a discussion.
2. Model Framework
2.1. Joint distribution
Let Y1, … , Yn be n independent bivariate random vectors and for i = 1, … , n. For j = 1, 2, we assume that the marginal cumulative distribution function (CDF) of Yij is given by , where θj represents a set of parameters associated with Fj, and a set of covariates for the ith subject. We construct the joint CDF of Yi via Gaussian copula with covariate-dependent correlation as follows. Let be such that
| (1) |
with
| (2) |
where . This formulation incorporates dynamic association between Yi1 and Yi2, i.e. association that depends on covariates, through the correlation between Zi1 and Zi2. We denote the joint CDF of Zi by Φτ to reflect its dependence on the parameter τ. For both discrete and continuous marginals, the general form of the joint CDF of Y is given by
| (3) |
where Φ−1 represents the inverse CDF of N(0,1) (Sklar, 1973). Note that the marginal distribution Fj can be either continuous or discrete. Further, apart from regressing the association on xi given in (2), each of the two marginal means can also be covariate-dependent, as in Masarotto and Varin (2012).
2.2. Specific cases for marginal distributions
We now present two marginal distributions and their associated marginal mean regression models used in Sections 4 and 5, for the zero-inflated gamma distribution and the beta distribution, respectively. The discussion on the zero-inflated negative binomial distribution is provided in Appendix A in the Supporting Information. All of these distributions appear frequently as parametric assumptions in analyses of data arising from single-cell experiments.
Zero-inflated gamma distribution
If the scRNA-seq data are normalized (Lytal et al., 2020), the positive integer-valued counts are transformed to positive reals while genes with zero counts remain zero. In this case, a zero-inflated gamma distribution can be useful because its support is over the non-negative reals and it could easily accommodate the skewness exhibited in real data. In this distribution, Yi degenerates at 0 with probability π, and follows a gamma(μiϕ, ϕ) distribution otherwise, with marginal density as follows:
| (4) |
Regression model (Ding et al., 2015) can be constructed by regressing the mean μi onto covariates via the log-link function . Optionally, it is sometimes of interest to consider covariates for the zero-inflation parameter π as well, for instance using a logit link via . The vector of all parameters in the marginal distribution is when π is associated with covariates xi, or otherwise.
Beta distribution
The second type of data is proportion-valued within the interval (0, 1), which often arises from experiments such as measuring the DNA methylation levels (e.g. Weinhold et al., 2016; Liu et al., 2021) or chromatin accessibility (e.g. Smallwood et al., 2014; Clark et al., 2018). In this case, a beta distribution of the form
| (5) |
is useful in beta regression (Cribari-Neto and Zeileis, 2010). Under this parameterization is the mean of Yi, and ϕ > 0 is the dispersion. The mean μi could be covariate-dependent via . The set of all parameters is .
Finally, we present a simple simulated data set to illustrate this framework described by Expressions (1) to (3). For i = 1, … , 1000, let the covariates xi ~ uniform(−1, 1). For , set in (2) so that log[(1 + ρi)/(1 − ρi)] = xi. A total of 1000 pairs of (Yi1, Yi2) are simulated, where Yi1 given xi follows a gamma distribution according to (5) and (6) with , π = 0, and ϕ1 = 10, while Yi2 given xi follows a beta distribution according to (7) and (8) with and ϕ2 = 10. In Figure 1, the interval (−1, 1) is divided into eight sub-intervals of equal length. Slices of {(yi1, yi2)} are plotted for xi falling within each interval. The superimposed contour on each plot represents the joint copula density evaluated at the midpoint of each interval for the given parameters. The Pearson correlation of (yi1, yi2)’s in each subset is calculated. As x increases, in addition to the change of marginal means, there is a gradual increase in the correlation between Y1 and Y2.
Figure 1.

The profile plot of {(yi1, yi2)} for xi’s in eight intervals in increasing order. Conditional on xi, yi1 follows a gamma distribution, and yi2 follows a beta distribution. The correlation ρi depends on xi following log[(1 + ρi)/(1 − ρi)] = xi. In addition, the marginal means increase with the magnitude of xi too. The superimposed contours represent the copula density of (Y1 , Y2) evaluated at the midpoint of each interval.
3. Estimation and Model Selection
In this section, we describe the parameter estimation procedure from the Bayesian perspective via the Markov chain Monte Carlo (MCMC) sampling scheme. Let Y1, … , Yn be independent random variables described by (3), and an observed sample. Under the framework in the previous section, the set of all parameters is {θ1, θ2, τ}. For j = 1, 2, θj represents all parameters related to the jth marginal distribution, and τ represents the regression parameter on the correlation in (2). Following Pitt et al. (2006) and Smith and Khaled (2012), we link each Yi by a Gaussian-distributed latent variable Zi. In the general form, the likelihood function is given by
| (6) |
where each fY(yi; θ1, θ2, τ, xi) can be expressed in terms of the latent Zi and the marginal distributions F1 and F2. When the marginal distribution Fj is continuous, there is a one-to-one correspondence between Yij and Zij, and hence zij is completely known once yij is observed. On the other hand, when Fj is discrete, Zij is unknown, and needs to be generated during sampling. Further, calculating the joint fY (yi; θ1, θ2, τ, xi) may be computationally intense, e.g. having to evaluate n double integrals of bivariate Gaussian density functions. Instead, we adopt a one-margin-at-a-time approach. Under this approach, each MCMC iteration goes through updating (θ1, z11, … , zn1), (θ2, z12, … , zn2), and τ.
Consider first the case that the marginal distribution Fj is continuous. Conditional on the observed and θj, there is a one-to-one correspondence between zij and yij through , and hence (z11, … , zn1) is completely known. Equation (4) in Song (2000) provides the joint pdf of Yi when both margins are continuous with pdf f1 and f2, given by
| (7) |
where , and I is the identity matrix. This is readily verified based on the transformation for j = 1, 2 from . Based on this observation, the full conditional of θj is given by
| (8) |
where f (θj) is the prior distribution of θj. Note that even though conditional on , each involves θj, through the nonlinear transformation .
On the other hand, when Fj is discrete, there is no longer a one-to-one correspondence between zij and yij. Instead, conditional on yij, zi(−j) and θj, Zij follows truncated between and , where is the left-limit of Fj (yij; θj, xi). The full conditional of θj is given by
| (9) |
Each likelihood of yij in the curly brackets is the probability of falling between and .
Let the support of Fj be denoted as for continuous and for discrete distributions, respectively. The posterior sampling algorithm is described as follows.
Initiate θ1, θ2, τ, and . Set t = 1.
- Sample τ(t) from
where is the density function of N2(0, Ri) given τ, and f(τ) the prior density of τ.(10) Set t = t + 1 and return to Step 2.
Steps 2(a) and 3 require a Metropolis scheme. Here we adopt an adaptive random-walk Metropolis sampling algorithm proposed in Harrio et al. (2001). Suppose . By the adaptive Metropolis sampling algorithm, we sample , where
| (11) |
for some pre-specified covariance C0 over the initial t0 iterations. To prevent a degenerate Ct, we set ϵ = 0.0001 in practice. The scaling parameter sd is set to be sd = (2.4)2/d. Then the acceptance ratio is given by
| (12) |
and θ* is accepted with probability min{1, r}. If the parameter space is bounded, appropriate transformations need to be considered before sampling from the multivariate proposal distribution.
In the simulation studies and real data analysis, we have used weakly informative prior distributions for θj and τ. For instance, Nd(0, 1002I) for τ, or gamma(0.001, 0.001) for the dispersion parameter ϕ > 0. Unlike the strictly uninformative priors such as the Jeffreys’ prior, these priors are proper, but with a broad support still carry little prior information about the parameters.
4. Simulation Studies
4.1. Random variate generation
In this section, we describe the mechanism used to generate zero-inflated data in the simulation studies. For i = 1, 2, … , n, given covariates xi and τ, first generate (zi1, zi2) with correlation ρi according to (1) and (2). Let , for j = 1, 2, where represents the CDF of N(0, 1). Finally, generate the response
| (13) |
where Fj denotes the jth marginal CDF.
To simulate from a zero-inflated distribution, suppose yij = 0 with probability πj, and comes from Fj with probability (1 − πj). Then, given ,
| (14) |
4.2. Simulation studies and results
This subsection presents two simulation scenarios. The two simulations share the similarity that the marginal distributions are zero-inflated gamma and beta in both cases. In the first simulation, only one covariate is considered. In the second simulation study, data are simulated according to the mouse gastrulation data described in Section 5. In both simulation scenarios, we compare the performance of our proposed method to that of existing approaches in terms of statistical power. An additional simulation scenario where the marginal distribution is discrete as zero-inflated negative binomial is provided in Appendix B.3 in the Supporting Information.
Scenario 1: Zero-inflated gamma and beta distribution
For i = 1, 2, … , n, let (yi1, yi2) be the bivariate response, and where xi ~ beta(2, 8) is the covariate. Latent variables are generated with . Marginally, is generated from a zero-inflated gamma distribution given by (4) with for and ϕ1 = 10. We also regress the zero-inflation parameter πi on xi according to for . At the other margin, is generated from a beta distribution given by (5) with for and ϕ2 = 10. These parameters are set to these values so that the simulated responses are similar to the observed response from the real data analysis in Section 5.
For sample sizes n ∈ {100, 200, 500, 1000}, a total of 1000 random samples were generated using this setting. The proposed model is fitted for each sample, and posterior 95% equal-tail credible intervals (CI) were obtained from the MCMC output, where the total number of iterations was 10000 with the first 1000 as burn-ins. We then calculated the proportion of CIs covering the true value of each parameter averaged over these 1000 replicates as the coverage probability. Results are shown in Table 1. As the sample size increases, the coverage probabilities are all about the nominal level and the MSEs gradually decrease towards 0.
Table 1.
Empirical coverage probability (95% CP) of posterior 95% credible intervals and mean squared error (MSE) of parameters in simulation scenario 1 based on 1000 random samples. For each parameter and sample size, the number on top represents the 95% CP, and the number on the bottom in parentheses represents the MSE.
| n = 100 | n = 200 | n = 500 | n = 1000 | n = 100 | n = 200 | n = 500 | n = 1000 | ||
|---|---|---|---|---|---|---|---|---|---|
| β 01 | 0.932 | 0.935 | 0.942 | 0.948 | β 11 | 0.934 | 0.941 | 0.951 | 0.952 |
| (0.196) | (0.156) | (0.075) | (0.041) | (0.092) | (0.061) | (0.016) | (0.001) | ||
|
| |||||||||
| η 0 | 0.931 | 0.938 | 0.949 | 0.952 | η 1 | 0.930 | 0.938 | 0.950 | 0.950 |
| (0.198) | (0.150) | (0.068) | (0.046) | (0.012) | (0.008) | (0.037) | (0.000) | ||
|
| |||||||||
| β 02 | 0.931 | 0.941 | 0.945 | 0.952 | β 12 | 0.937 | 0.948 | 0.951 | 0.949 |
| (0.215) | (0.162) | (0.109) | (0.054) | (0.019) | (0.011) | (0.004) | (0.000) | ||
|
| |||||||||
| τ 0 | 0.932 | 0.939 | 0.952 | 0.948 | τ 1 | 0.933 | 0.936 | 0.948 | 0.951 |
| (0.141) | (0.105) | (0.067) | (0.023) | (0.013) | (0.009) | (0.005) | (0.000) | ||
|
| |||||||||
| ϕ 1 | 0.933 | 0.937 | 0.949 | 0.953 | ϕ 2 | 0.932 | 0.938 | 0.951 | 0.949 |
| (0.998) | (0.403) | (0.176) | (0.063) | (0.960) | (0.342) | (0.218) | (0.110) | ||
Further, we compare the proposed method and several existing methods in terms of statistical power in detecting dynamic association. Fixing the sample size at n = 500, pairs of were simulated with for τ1 = 0, 0, 3, 0.6, … , 2.7, 3. For i = 1, 2, … , n, the covariates is given by with xi ~ beta(2, 8), according to the data analysis results in Section 5. Note that when τ1 = 0, ρi is not dependent on the covariate xi and hence there is no dynamic association. On the other hand, the covariate xi affects ρi when τ1 ≠ 0. Therefore, testing for dynamic association in this example is the same as testing H0 : τ1 = 0 against H1 : τ1 ≠ 0. For each true value of τ1, 10000 random samples were simulated. For the proposed method, a 95% credible interval of τ1 was calculated from the MCMC output. The power was the proportion of CIs excluding 0 out of the 10000 replicates.
The proposed method was compared to the liquid association (LA) in Li (2002) and a standardized version of LA (sLA), and the full and simple version of the conditional normal model (CNM (full), CNM (simple)) in Ho et al. (2011). Under the assumptions that Y1, Y2 are standardized with mean 0 and variance 1, X follows a standard normal distribution, and the marginal means of Y1 and Y2 are free of X, Li (2002) used to measure the dynamic association between Y1 and Y2 according to X. In this simulation, the sample mean was used a statistic for the liquid association, and p-values were obtained using a permutation test. A standardized version of the LA statistic, , was also considered in the comparison.
In terms of the two versions of the conditional normal model in Ho et al. (2011), a key assumption of the model is that conditional on X, Y1 and Y2 follow a bivariate normal distribution. The full model includes covariate-dependent mean, variances and correlation while the CNM (simple) only considers covariate-dependent correlation. P-values for sLA, CNM (full) and CNM (simple) were obtained from the functions provided in the LiquidAssociation R package.
The comparison of statistical power for the methods described above is presented in Figure 2. At τ1 = 0, our proposed flexible copula model (FC) and CNM (full) are the only two methods that maintain the 5% nominal significance level. Strikingly, the original liquid association measure (LA) has a Type I error rate of almost 100%. CNM (simple) and sLA also have fairly high Type I error of around 30%. This is partly due to these models do not account for covariate-dependent means. On the other hand, as the signal of τ1 increases, the power to detect the signal for FC and CNM (full) both increase toward 1. However, the conditional normal model fails to keep up with the proposed model, mainly because of its assumption that Y1 and Y2 given X are normally distributed.
Figure 2.

Comparison of statistical power in testing H0 : τ1 = 0 against H1 : τ1 ≠ 0 for the proposed flexible copula model (FC), full and simple version of conditional normal model (CNM (full), CNM (simple)), unstandardized liquid association (LA) and standardized liquid association (sLA). For each true value of τ1, the power was calculated based on 10000 random samples.
Scenario 2: Data simulated according to real data analysis
In this scenario, we perform power analyses to compare the proposed copula model versus the full conditional normal model in Ho et al. (2011) according to the experimental data analysis presented in Section 5 (Equations (14) to (16)). Specifically, for i = 1, 2 … , n = 627, let xi0 = 1, xi1 ~ beta(1.5, 7.5), (wi, xi2, xi3, xi4, xi5) ~ categorical(0.2 · 15), i.e. a categorical distribution of five categories with equal probability, and xij = xi1 · xi(j−4) for j = 6, 7, 8, 9. The covariate vector for the ith observation is given by . Note that when simulating from the categorical distribution, wi represents the baseline category, hence not included in the covariates. In terms of the multi-omics profiling data in Section 5, xi1 represents the proportion of methylation at the promoter region of a particular gene for the ith cell, (wi, xi2, xi3, xi4, xi5) the germ layer of the cell, and xi6, … , xi9 the interaction between the methylation and the germ layer. Similar to Scenario 1, yi1 was generated from a zero-inflated gamma distribution according to (4) and yi2 was generated from a beta distribution according to (5). We regressed both marginal means as well as the zero-inflation parameter π in (4). Most parameters were set to be the posterior estimates from the results in Section 5 related to the genes Mrpl58 and Sox17. Importantly, in terms of the dynamic association parameter τ in (2), the posterior estimate for τ1 was (See Table 2 for detail.) In this simulation, the true value of τ1 is varied from −15 to 0 with increments of 3. For each true value of τ1, we simulated 10000 random samples, and tested H0 : τ1 = 0 against H1 : τ1 ≠ 0 using the proposed method and the conditional normal model in Ho et al. (2011). The power of the two methods was calculated as the proportion of H0 rejected out of the 10000 replicates.
Table 2.
Top twenty genes with the largest dynamic association estimates between scRNA-seq and chromatin accessibility associated with the DNA methylation at the promoter region of Sox17 at each germ layer. Point estimates (posterior mean) represent the estimated dynamic association. Corresponding 95% credible intervals are shown in parentheses.
| Epiblast (4.5-5.5 days) | Epiblast (6.5-7.5 days) | Endoderm | Mesoderm | Ectoderm | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Gene | Gene | Gene | Gene | Gene | ||||||
| 1 | Mrpl58 | −11.33 (−15.04,−1.16) | Rps23 | 12.73 (5.87,17.92) | Fh1 | −22.41 (−32.54,−0.03) | Mrpl54 | −14.93 (−20.87,−4.08) | Adat1 | 29.79 (2.5,41.49) |
| 2 | Otulin | −10.9 (−13.96,−1.2) | Abl1 | 11.53 (5.13,17.01) | Dcaf11 | 21.03 (4.76,29.77) | Pnisr | 13.42 (1.5,21.68) | Ptk7 | −28.41 (−39.35,−0.01) |
| 3 | Pusll | −10.47 (−14.32,−2.19) | Cit | −11.38 (−15.16,−2.07) | Dhx15 | −20.41 (−31,−0.91) | Kmt2b | −13.33 (−20.85,−4.33) | Btf3l4 | −25.21 (−38.67,−4.53) |
| 4 | Ergic3 | −10.34 (−15.74,−0.12) | Ebna1bp2 | 9.15 (2.34,13.39) | Fasn | 20.25 (2.14,28.95) | Atxn10 | −13.33 (−19.62,−3.15) | Ppib | 23.76 (4.59,33.91) |
| 5 | Thoc5 | −9.78 (−13.57,−1.03) | Zgpat | 9.14 (2.75,13.29) | Taco1 | 19.37 (0.48,31.89) | Rer1 | 12.57 (1.24,19.52) | Dnajc17 | 22.74 (4.35,34.64) |
| 6 | Vmp1 | −9.32 (−13.84,−1.53) | Atmin | −9.07 (−13.62,−1.45) | Cycs | 19.2 (0.95,30.35) | Eif2b2 | 12.12 (3.26,18.43) | Rcc1 | 21.74 (0.87,37) |
| 7 | Tom1l1 | 9.17 (0.4,13.38) | Chd1l | −8.85 (−12.8,−2.18) | Gtf2a1 | −18.51 (−24.96,−3.56) | Tm9sf1 | −11.98 (−17.18,−2.44) | Ppp3cb | 21.54 (1.3,32.85) |
| 8 | Snrpd2 | 9.09 (0.62,13.29) | Ephb4 | −8.46 (−13.08,−1.03) | Mipep | −18.21 (−25.32,−3.4) | Zc3h4 | −11.92 (−19.43,−0.44) | Chd1l | −21.47 (−34.37,−3.84) |
| 9 | Rtl8b | 9.08 (3.04,12.8) | Pi4k2b | 8.25 (1.56,12.48) | Zfp740 | −17.9 (−27.44,−1.26) | Mthfs | 11.81 (5.98,17.58) | Bop1 | 20.69 (0.1,35.51) |
| 10 | Prr3 | −8.56 (−14.18,−0.12) | Cbx3 | 8.24 (2.85,11.95) | Ube2f | 17.85 (3.45,24.07) | Gpatch11 | 11.8 (1.65,18.55) | Smg1 | −20.67 (−32.88,−2.18) |
| 11 | Mrpl20 | 8.36 (1.4,11.83) | Tcp11l1 | −8.17 (−12.57,−0.94) | Chaf1a | −17.59 (−26.49,−1.98) | Tia1 | −11.79 (−18.22,−1.2) | Pmpca | 19.61 (0.34,30.26) |
| 12 | Calu | −7.99 (−13.21,−1.26) | Abhd4 | 8.1 (1.35,13.81) | Tusc2 | 17.47 (1.87,29.05) | Mtx1 | −11.71 (−18.38,−0.49) | Kxd1 | −19.58 (−28.52,−0.51) |
| 13 | Nxn | 7.9 (0.75,11.51) | Mrpl54 | 8.07 (0.16,12.22) | Pafah1b2 | −17.24 (−26.9,−1.42) | Nup205 | −11.46 (−17.72,−2.83) | Hmmr | 19.43 (2.55,28.1) |
| 14 | Oga | 7.9 (0.06,11.76) | Gsta4 | −7.94 (−12.62,−0.13) | Chchd3 | 16.18 (4.94,25.27) | Pgm2 | −11.34 (−17.7,−3.36) | Gcn1 | −19.24 (−30.05,−4.88) |
| 15 | Ccdc186 | 7.83 (1.26,11.52) | Ufm1 | −7.91 (−12.74,−0.13) | Kat5 | 15.6 (4.03,22.69) | Far1 | 10.97 (0.06,17.09) | Usp25 | −19.12 (−33.32,−0.77) |
| 16 | Tspan4 | −7.79 (−12.89,−0.73) | Tmed7 | 7.82 (1.59,12.68) | Ctsd | 15.1 (0.97,25.92) | Ldha | −10.75 (−16.72,−0.93) | Dhx36 | −18.94 (−28.81,−0.07) |
| 17 | Clcn5 | −7.78 (−10.87,−0.07) | Ap1s1 | 7.72 (1.06,12.33) | Fcf1 | −15.08 (−21.91,−1.55) | Srebf2 | −10.75 (−16.27,−3.16) | Tmem192 | 18.76 (2.45,27.33) |
| 18 | Mcmbp | −7.52 (−11.4,−1.16) | Echdc2 | −7.65 (−11.8,−1.92) | Klf9 | 14.64 (2.06,22.23) | Mtif2 | 10.62 (1.5,17.44) | Pgs1 | 18.51 (2.44,31.89) |
| 19 | Pim1 | −7.33 (−10.65,−2.27) | Zfp511 | 7.63 (0.57,11.98) | Lin28b | 14.57 (2.03,22.06) | Cenph | −10.61 (−15.54,−3.65) | Ciao1 | 18.32 (1.15,32.6) |
| 20 | Ikbip | −7.21 (−11.29,−1.92) | Ndufa12 | 7.6 (0.77,12.21) | Noct | −14.11 (−21.64,−4.61) | Phf10 | −10.46 (−16.22,−1.4) | Tns3 | −18.04 (−30.65,−0.48) |
The result of the comparison is presented in Figure A.2 in Appendix B.1. Both methods maintain the 5% nominal Type I error rate. As the magnitude of τ1 increases, the proposed method outperforms the CNM (full) in detecting dynamic association. We note here that unlike Scenario 1, we did not compare the proposed method to Liquid Association in Li (2002) in scenario 2, since Li (2002)’s approach could not consider multiple covariates.
5. Multi-omics Profiling of Mouse Gastrulation
Argelaguet et al. (2019) studied the multi-omics profiling of mouse gastrulation at single-cell resolution. In this study, n = 627 single cells were taken from different germ layers of embryonic cells between 4.5 and 7.5 days. During early stages of development (4.5-5.5 days), samples solely came from the epiblast. At 6.5-7.5 days, samples were taken from epiblast, endoderm, mesoderm, and ectoderm. For each cell, parallel chromatin accessibility, DNA methylation, and gene expression profile were sequenced under the single-cell scNMT-seq (single-cell nucleosome, methylation, and transcription sequencing) protocol of Clark et al. (2018). On average, 37.9 cells per embryo were sequenced for scNMT-seq; 54.6 cells per embryo were sequenced for scRNA-seq. According to Argelaguet et al. (2019); Mohammed et al. (2017), the cells from the same germ layer at the same developmental stage are assumed to be homogeneous. All sequencing were performed on a NextSeq 500 instrument with a raw sequencing depth of 1 million paired-end reads per cell for scRNA-seq.
Raw scRNA-seq data were normalized via the deconvolution strategy described in Lun et al. (2016). Methylation calling and separation of endogenous methylation and chromatin accessibility was performed using Nucleosome Occupancy and Methylation sequencing (NOMe-seq; Kelly et al., 2012; Pott, 2017). Following Smallwood et al. (2014), the CpG methylation or GpC accessibility for each site in each cell is measured as binary (0 or 1). Then the CpG methylation or GpC accessibility rate at the promoter (or genebody) region were computed as the average of the methylated CpG (or GpC for accessibility) sites over all observed sites within the region. Hence DNA methylation and chromatin accessibility are both recorded as rates (proportions).
Among the 18,345 genes sequenced, we filtered out genes with more than 80% 0’s and with small variances in scRNA-seq and chromatin accessibility (less than the medians of all genes) and kept the remaining 1,260 genes in the following analysis. For each of the 1, 260 genes and the ith sample, the scRNA-seq and the gene-body chromatin accessibility measurements were paired as Yi1 and Yi2, respectively. We denote this set of 1,260 genes as . In Argelaguet et al. (2019), the authors identified 15 genes that are transcription factors at different germ layers whose methylation at the promoter regions may affect how scRNA-seq interacts with chromatin accessibility during mouse gastrulation. We call the set of these 15 transcription factors .
The study by Duren et al. (2017) reported that the association between transcription activities and cis-regulatory elements (REs) can be modulated by trans-acting transcription factors (TFs). The interplays between cis-REs, TFs, and the expression levels of target genes could vary in different cells at various developmental stages. To study these intricate transcription regulatory mechanisms, for each pair of genes in and (1,260 × 15 = 18, 900 gene pairs in total), we fit the proposed copula model by considering the methylation at the promoter region, different germ layers, and their interactions. For i = 1, … , n, we assume that Yi1 follows a zero-inflated gamma distribution with marginal mean regression performed via (4), and that Yi2 follows a beta distribution with marginal mean regression performed via (5). The marginal mean of gamma distribution, μi1, is covariate-dependent and is modeled as:
| (15) |
where the variable methylation refers to the percentage of methylation at the promoter region, and I(·) is an indicator specifying the germ layer for the ith observation. Note that in this case, epiblast (4.5-5.5 days) is the baseline. Further, we allow the zero-inflation parameter π to vary across different germ layers; i.e.
| (16) |
The dispersion parameter for Yi1 is given by ϕ1. The marginal mean regression for Yi2 is similar to (14), but with a logit link for the marginal mean μi2. The dispersion at this margin is given by ϕ2. Regression on ρi is performed according to
| (17) |
Note that by this setting, τ1 represents the dynamic association between RNA-seq and chromatin accessibility of gene 1 due to the methylation of gene 2 for epiblast (4.5-5.5 days). The coefficients τ6 to τ9 represent the change in the dynamic association from the baseline to their respective germ layer. To guarantee convergence, posterior draws are based on Markov chains of 20,000 iterations, and 5,000 burn-ins in each of the 18,900 models (gene pairs).
We examine the results from several different aspects. From a broad perspective, we are interested in how the two groups of genes are connected to each other in terms of dynamic association. For instance, τ1 quantifies the dynamic association for epiblast (4.5-5.5 days) according to Equation (16). Based on the posterior draws of τ1, for each of the 15 transcription factors whose methylation may lead to significant dynamic association between RNA-seq and chromatin accessibility of genes in , we kept the top 5% genes with the largest in absolute value, provided the corresponding credible intervals exclude 0. A network plot in Figure 3 summarizes the results. In the plot, the 15 transcription factors are represented as diamonds. Each of these 15 genes are linked to a small number of RNA-seq/chromatin genes (in circles), forming small “local” clusters. Eight out of these fifteen local clusters were isolated from others. The other seven formed larger clusters through other genes.
Figure 3.

Network plot of (RNA-seq/chromatin accessibility, in circles) and (transcription factors, in diamonds) based on the magnitude of posterior draws of τ1 for epiblast (4.5-5.5 days). Links are established if || is among the 5% largest, provided the credible interval excludes 0. Edges with positive estimates () are represented as dash lines, negative estimates () as solid lines.
Biologically, Sox17 is the transcription factor essential for vertebrate endoderm development (Kanai-Azuma et al., 2002; Hudson et al., 1997), hence we choose Sox17 as an example. We rank the gene pairs with significant dynamic associations (95% CI excluding 0) by the magnitude of point estimates at each germ layer, and find the top gene pairs having the largest dynamic association. Table 2 presents the top twenty genes with the largest dynamic association estimates between scRNA-seq and chromatin accessibility due to the DNA methylation at the promoter region of Sox17 at each germ layer.
Lastly, for each specific pair of genes, we wish to examine how the dynamic association changes according to the methylation level across different germ layers. We demonstrate this perspective by considering the gene pairs Sox17 and Mrpl58 from Figure 3. According to (16) and the posterior mean of τ, Figure A.2 in Appendix C.1 presents an illustration of the estimated ρ as a function of the promoter methylation of Sox17 at different germ layers. The methylation of Sox17 influences in similar manners in Epiblast (4.5-5.5 days) and Ectoderm. When Sox17 is more methylated in these cell types, the correlation between RNA-seq and chromatin accessibility in Mrpl58 tends to decrease. This decreasing pattern is also present in Mesoderm, but to a lesser degree. On the other hand, in Epiblast (6.5-7.5 days) and Endoderm, when Sox17 is more methylated in these cell types, the correlation between RNA-seq and chromatin accessibility in Mrpl58 tends to increase as shown in Figure A.2.
As pluripotent epiblast cells differentiate into the three primary germ layers, lineage specific transcription factors, such as Sox17, activate gene regulatory networks necessary for lineage specification. The opposite profiles of ρ for Mrpl58 in epiblast and endoderm as Sox17 methylation changes (Figure A.2) might imply that Sox17 acts to repress Mrpl58 during endoderm formation. Among the genes where is negative in Table 2, which is suggestive of transcriptional activation by Sox17, are Dhx15, Mipep, and Noct, which all have cis-regulatory modules bound by Sox17 during Xenopus gastrulation and may be part of the Sox17 gene regulatory network (Mukherjee et al., 2020).
To further demonstrate this dynamic relationship, Figure 4 provides the contour plots of the posterior predictive density of (Y1, Y2). Each row represents the estimated density in each germ layer at the 10th, 50th, and 90th sample percentile of the promoter methylation of Sox17, which corresponds to 0.04, 0.09, and 0.14, respectively. The points superimposed on each plot represent the observed data in each germ layer for the promoter methylation falling within a 20% equal-tail neighborhood of the corresponding sample percentiles. When the methylation level is small, the association between scRNA-seq and chromatin accessibility of Mrpl58 is mildly positive. As the methylation level increases, the association gradually decreases towards being negatively associated. For instance, at the baseline (epiblast, 4.5-5.5 days), the association between scRNA-seq and chromatin accessibility of Mrpl58 is positive when the methylation of Sox17 is low. As the methylation level of Sox17 increases, the association pattern becomes negative, and the strength of the negative association increases with the methylation level. Network plots similar to Figure 3 for other germ layers are provided in Appendix C.2. Convergence and posterior diagnostics results for this model are provided Appendix C.3. In addition, results of a ZINB model using the raw scRNA-seq counts is provided in Appendix C.4 in the Supporting Information.
Figure 4.

Contour plots of posterior predictive joint density of scRNA-seq (zero-inflated gamma distribution) and chromatin accessibility (beta distribution) of Mrpl58. The rows correspond to the promoter methylation of Sox17 at its 10th, 50th, and 90th sample percentile, respectively. Each column represents a specific germ layer. Points on each plot represent the observed data in each germ layer within a 20% equal-tail neighborhood of the sample percentiles of the promoter methylation of Sox17.
6. Discussion
We proposed a general framework for analyzing the dynamic association between two random variables given a set of covariates. In order to account for various marginal distributions observed in single-cell multi-omics data, we constructed copula-based models and incorporated flexible covariate-dependent correlation structures in the proposed approaches. Simultaneously, the marginal means can be covariate-dependent as well. An MCMC sampling algorithm is introduced to estimate the model parameters. The usefulness of the proposed framework is demonstrated through a series of simulation studies and an analysis of a set of single-cell multi-omics data for mouse gastrulation.
When analyzing the co-expression pattern across different genes or modalities, most existing approaches assume normality on the marginal distributions and directly use the Pearson correlation coefficient for inference. However, despite its convenience, this assumption may not be appropriate for single-cell multi-omics analysis, due to the nature of the data such as zero-inflated discrete count data, and proportion-valued data. In the paper, we consider several specific choices for marginal distributions for their popularity in analyzing multi-omics data. Furthermore, our proposed framework could accommodate a wide range of marginal distributions, continuous or discrete. This flexibility lends itself to situations where the normality assumption and the Pearson correlation may be inadequate.
In our proposed model, we used the Fisher’s transformation for ρ to ensure −1 ⩽ ρ ⩽ 1. Other sigmoid functions can also be easily adapted into our model framework. Our model-based framework can also be readily applied to study coregulators in transcriptional activities as reported by Zhou et al. (2007) by considering multiple co-regulators as covariates.
In this paper, we followed the Bayesian approach for several reasons. First, model fitting using the Bayesian method via an MCMC scheme is fairly straightforward. Second, we can obtain the full posterior probability distribution of the parameters, which makes inference readily achievable. To calculate the standard error of parameters when modeling the dynamic association, frequentist approaches, such as Ho et al. (2011), often use resampling schemes such as jackknifing.
The Bayesian framework in this paper also offers a method for model comparison using the log-pseudo marginal likelihood (LPML, Gelfand and Dey, 1994), which we have explored in Appendix B.2 through a simulation study that compares the dynamic association model to a simpler model without dynamic association. In addition, the LPML is useful to compare different candidate marginal distributions, e.g. to test whether the distributional assumptions are appropriate for the data.
Alternatively, the Kolmogorov-Smirnov test can be used to test the goodness-of-fit of a specific marginal null distribution. In the experimental data analysis in Section 5, for each of the 1,260 genes, we performed a Kolmogorov-Smirnov test on the non-zero part of the scRNA-seq with the gamma distribution being the null hypothesis. Shape and scale parameters of the gamma distribution were estimated using MLE assuming a gamma likelihood. With multiple comparison correction (Benjamini and Hochberg, 1995), none of the null distribution of gamma was rejected under a 5% false discovery rate.
To correct for multiple testings, the approaches described in Muller et al. (2004) and Wang and Dunson (2010) can be implemented in conjunction with our analytical framework. Briefly, the posterior densities can be used to calculate Bayes factors and the expected false discovery rate to determine the optimal threshold on the Bayes factors for type I error control under multiple hypotheses. In terms of model fitting, we set the tuning parameters of the adaptive MCMC algorithm according to the suggestions given by Harrio et al. (2001) and Harrio et al. (2005). We considered uninformative priors on all parameter in this paper. Nevertheless, when fitting the marginal means using a (generalized) linear model, informative priors such as the g-prior (Zellner, 1986) on the regression coefficients can be considered. In the experimental data analysis, we considered DNA methylation statuses for genes encoding transcription factors as the covariate “methylation” in Equations (15) and (17). Databases such as ELMER (Yao et al., 2015), FunGenES (Schulz et al., 2009), and Mouse TF Atlas (Zhou et al., 2017) could potentially provide useful prior information for transcription factors and their target genes in mouse embryonic stem cells.
In the simulation studies as well as the real data analysis, convergence of the Markov chains was verified using the Gelman-Rubin convergence diagnostic based on four parallel chains. In terms of the computational performance, the proposed model was implemented solely using the statistical software R. It takes about two minutes on a personal laptop (Intel Core i5-7200U CPU) to complete a Markov chain of 10000 iterations in Simulation Scenario 1 of Section 4.
Supplementary Material
Acknowledgements
This research is supported by NIH grants 1R21CA264353-01. All authors declare no conflict of interest.
Footnotes
Supporting Information
Web Appendices, Tables, and Figures referenced in Sections 2, 4, and 5 are available with this paper at the Biometrics website on Wiley Online Library. The R code for implementing the proposed method is available on Github at https://github.com/ZichenMa-USC/FlexibleCopulaModel.
Data Availability Statement
The data that support the findings in this paper are openly available at Gene Expression Omnibus, accession number GSE121708 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121708, Argelaguet et al. (2019)).
References
- Argelaguet R, Clark S, Mohammed H, et al. (2019). Multi-omics profiling of mouse gastrulation at single-cell resolution. Nature 576, 487–491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 51, 289–300. [Google Scholar]
- Cao J et al. (2018). Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361, 1380–1385. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J, Xie J, and Li H (2011). A penalized likelihood approach for bivariate conditional normal models for dynamic co-expression analysis. Biometrics 67, 299–308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark S, Argelaguet R, Kapourani C-A, et al. (2018). scnmt-seq enables joint profiling of chromatin accessibility dna methylation and transcription in single cells. Nature Communications 9, 781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cribari-Neto F and Zeileis A (2010). Beta regression in R. Journal of Statistical Software 34, 1–24. [Google Scholar]
- Ding B, Zheng L, Zhu Y, Li N, Jia H, Ai R, Wildberg A, and Wang W (2015). Normalization and noise reduction for single cell RNA-seq experiments. Bioinformatics 31, 2225–2227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duren Z, Chen X, R. J, et al. (2017). Modeling gene regulation from paired expression and chromatin accessibility data. Proceedings of the National Academy of Sciences 114, E4914–E4923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gelfand A and Dey D (1994). Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Society, Series B 56, 501–514. [Google Scholar]
- Harrio H, Saksman E, and Tamminen J (2001). An Adaptive Metropolis Algorithm. Bernoulli 7, 223–242. [Google Scholar]
- Harrio H, Saksman E, and Tamminen J (2005). Component-wise Adatption for High Dimensional MCMC. Computational Statistics 20, 265–273. [Google Scholar]
- Ho Y-Y, Parmigiani G, Louis TA, and Cope LM (2011). Modeling liquid association. Biometrics 67, 133–141. [DOI] [PubMed] [Google Scholar]
- Hu Y et al. (2016). Simultaneous profiling of transcriptome and DNA methylome from a single cell. Genome Biology 17, 88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hudson C, Clements D, Friday R, Stott D, and Woodland H (1997). Xsox17α and -β mediate endoderm formation in Xenopus. Cell 91, 397–405. [DOI] [PubMed] [Google Scholar]
- Kanai-Azuma M, Kanai Y, Gad J, Tajima Y, Taya C, Kurohmaru M, Sanai Y, Yonekawa H, Yazaki K, Tam P, and Hayashi Y (2002). Depletion of definitive gut endoderm in Sox17-null mutant mice. Development 129, 2367–2379. [DOI] [PubMed] [Google Scholar]
- Kelly T, Liu Y, Lay F, Liang G, Berman B, and Jones P (2012). Simultaneous measurement of chromatin accessibility, dna methylation, and nucleosome phasing in single cells. Genome Research 22, 2497–1506.22960375 [Google Scholar]
- Kinzy T, Starr T, G.C. T, and Ho Y-Y (2019). Meta-analytic framework for modeling gene coexpression dynamics. Statistical Applications in Genetics and Molecular Biology 18, 1–12. [DOI] [PubMed] [Google Scholar]
- Lee J, Hyeon DY, and Hwang D (2020). Single-cell multiomics: technologies and data analysis methods. Experimental & Molecular Medicine 52, 1428–1442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li K-C (2002). Genome-wide coexpression dynamics: theory and application. Proceedings of the National Academy of Sciences 99, 16875–16880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li K-C, Liu C-T, Sun W, Yuan S, and Yu T (2004). A system for enhancing genome wide coexpression dynamics study. Proceedings of the National Academy of Sciences 101, 15561–15566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu H, Zhou J, Tian W, et al. (2021). DNA methylation atlas of the mouse brain at single-cell resolution. Nature 598, 120–128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lun A, Bach K, and Marioni J (2016). Pooling across cells to normalize single-cell rna sequencing data with many zero counts. Genome Biology 17, 75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lytal N, Ran D, and An L (2020). Normalization Methods on Single-Cell RNA-seq Data: An Empirical Survey. Frontiers in Genetics 11, 41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma Z, Hanson T, and Ho Y-Y (2020). Flexible bivariate correlated count data regression. Statistics in Medicine 39, 3476–3490. [DOI] [PubMed] [Google Scholar]
- Masarotto G and Varin C (2012). Gaussian copula marginal regression. Electronic Journal of Statistics 6, 1517–1549. [Google Scholar]
- Mohammed H, Hernando-Herraez I, Savino A, et al. (2017). Single-cell landscape of transcriptional heterogeneity and cell fate decisions during mouse early gastrulation. Cell Reports 20, 1215–1228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mukherjee S, Chaturvedi P, Rankin S, Fish M, Wlizla M, Paraiso K, MacDonald M, Chen X, Weirauch M, Blitz I, Cho K, and Zorn A (2020). Sox17 and β-catenin co-occupy Wnt-responsive enhancers to govern the endoderm gene regulatory network. Elife 9, e58029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muller P, Parmigiani G, Robert C, and Rousseau J (2004). Optimal sample size for multiple testing. Journal of the American Statistical Association 99, 990–1001. [Google Scholar]
- Nelsen RB (2006). An Introduction to Copulas, chapter 4, pages 109–155. Springer. [Google Scholar]
- Pitt M, Chan D, and Kohn R (2006). Efficient Bayesian Inference for Gaussian Copula Regression Models. Biometrika 93, 537–554. [Google Scholar]
- Pott S (2017). Simultaneous measurement of chromatin accessibility, DNA methylation, and nucleosome phasing in single cells. Elife 6, e23203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schulz H, Kolde R, Adler P, et al. (2009). The fungenes database: A genomics resource for mouse embryonic stem cell differentiation. PLoS ONE 4, e6804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sklar A (1973). Random Variables, Joint Distribution Functions, and Copulas. Kybernetika 9, 449–460. [Google Scholar]
- Smallwood S, Lee H, Angermueller C, et al. (2014). Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nature Methods 11, 817–820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith M and Khaled M (2012). Estimation of Copula Models With Discrete Margins via Bayesian Data Augmentation. Journal of the American Statistical Association 107, 290–303. [Google Scholar]
- Song PX-K (2000). Multivariate Dispersion Models Generated from Gaussian Copula. Scandinavian Journal of Statistics 27, 305–320. [Google Scholar]
- Wang L and Dunson D (2010). Semiparametric Bayes multiple testing: applications to tumor data. Biometrics 66, 493–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang L, Liu S, Ding Y, Yuan S-S, Ho Y-Y, and Tseng G (2017). Meta-analytic framework for liquid association. Bioinformatics 33, 2140–2147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weinhold L, Wahl S, Pechlivanis S, Hoffmann P, and Schmid M (2016). A statistical model for the analysis of beta values in DNA methylation studies. BMC Bioinformatics 17, 480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z and Ho Y-Y (2021). Modeling dynamic correlation in zero-inflated bivariate count data with applications to single-cell RNA sequencing data. Biometrics 2021,. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao L, Laird P, Farnham P, and Berman B (2015). Inferring regulatory element landscapes and transcription factor networks from cancer methylomes. Genome Biology 16,. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zellner A (1986). On assessing prior distributions and bayesian regression analysis with g prior distributions. In Studies in Bayesian Econometrics and Statistics, Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti. Studies in Bayesian Econometrics and Statistics, pages 233–243. New York Elsevier. [Google Scholar]
- Zhou Q, Chipperfield H, Melton D, and Wong W (2007). A gene regulatory network in mouse embryonic stem cells. Proceedings of the National Academy of Sciences 104, 16438–16443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Q, Liu M, Xia X, et al. (2017). A mouse tissue transcription factor atlas. Nature Communications 8, 15089. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings in this paper are openly available at Gene Expression Omnibus, accession number GSE121708 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE121708, Argelaguet et al. (2019)).
