A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq data

Kefei Liu; Jieping Ye; Yang Yang; Li Shen; Hui Jiang

doi:10.1109/TCBB.2018.2790918

. Author manuscript; available in PMC: 2019 Aug 8.

Published in final edited form as: IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan 8;16(2):442–454. doi: 10.1109/TCBB.2018.2790918

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq data

Kefei Liu ¹, Jieping Ye ², Yang Yang ³, Li Shen ⁴, Hui Jiang ⁵

PMCID: PMC6686202 NIHMSID: NIHMS1043996 PMID: 29993952

Abstract

The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. The normalization of existing DE detection algorithms is ad hoc and performed once for all prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented lagrangian method to solve it. Simulation studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.

Index Terms—: RNA-Seq, differential expression analysis, normalization, linear regression, L1-Norm regularization, augmented Lagrangian method

1. INTRODUCTION

Ultra high-throughput sequencing of transcriptomes (RNA-seq) is a widely used method for quantifying gene expression levels due to its low cost, high accuracy and wide dynamic range for detection [1]. As of today, modern ultra high-throughput sequencing platforms can generate hundreds of millions of sequencing reads from each biological sample in a single day. RNA-seq also facilitates the detection of novel transcripts [2] and the quantification of transcripts on isoform level [3], [4]. For these reasons, RNA-seq has become the method of choice for assaying transcriptomes [5].

One major limitation of RNA-seq is that it only provides relative measurements of transcript abundances due to difference in library size (i.e., sequencing depth) between samples [6]. Normalization of RNA-seq read counts is required in gene differential expression analysis to correct for such variation between samples. A popular form of between-sample normalization is achieved by scaling raw read counts in each sample by a sample-specific factor related to library size [6], [7]. This include CPM/RPM (counts/reads per million) [8], quantile normalization [9],

[10], upper-quartile normalization [11], trimmed mean of M values [8] and DESeq normalization [12]. Also, commonly used gene expression measures, e.g., TPM (transcript per million) [13], and RPKM/FPKM (reads/fragments per kilo-base of exon per million mapped reads) [1], [2], also correct for difference in gene length within a sample [14] (the so-called within-sample normalization). In particular, the CPM/RPM (counts/reads per million) [8], TPM (transcript per million) [13], and RPKM/FPKM (reads/fragments per kilobase of exon per million mapped reads) [1], [2] for the i-th gene from the j-th sample are respectively defined as

{cpm}_{i j} = 10^{6} \frac{c_{i j}}{N_{j}} {fpkm}_{i j} = 10^{9} \frac{c_{i j}}{l_{i} \cdot N_{j}} tpm_{i j} = 10^{6} \frac{c_{i j} / l_{i}}{\sum_{i} c_{i j} / l_{i}}

(1)

where c_ij is the observed read count for gene i from the j-th sample, N_j= ∑_ic_ij is the sequencing depth in the j-th sample, and ℓ_i be the length of gene i. In this work we focus on between-sample normalization.

In traditional count-based RNA-seq analysis methods, the read counts for each gene are assumed to follow a Poisson [15] or negative binomial (NB) distribution. One issue with the count-based RNA-seq analysis methods is that their procedures are complicated and contain many ad hoc heuristics. Moreover, the Poisson or NB distributions of counts are mathematically less tractable than the normal distribution [16], [17]. This makes count-based methods difficult to generalize to new data. Moreover, commonly used statistical methods for microarray data analysis, e.g., quality weighting of RNA samples, addition of random noise to generate technical replicates, and gene set test [16] have been designed for normally distributed data and it is unclear whether we can adapt them to count data. Also the presence of outliers is an issue that is not well addressed (addressed in a very ad hoc manner) by existing methods. To handle that, the authors of [16] take the logarithm of the raw count of reads and apply normal distribution-based statistical methods to analyze them. Note that by logarithmic transformation, the dynamic range of the RNA-seq counts is compressed such that the outlier counts are largely transformed into “normal” data. As a result, sophisticated way to detect and discard outliers [18], [19], [20] is not required.

In this paper, like in [16], [17] we work with logtransformed gene expression values and propose a unified statistical model for differential gene expression. Different from [16], [17], we model sample-specific scaling factors for between-sample normalization as unknown parameters and incorporate them into the gene-wise linear models. By imposing the sparsity-inducing penalty (L1 penalty for single treatment factor and mixed L1/L2 penalty for multiple treatment factors) on the regression coefficients and carefully choosing the tuning parameter, the model is able to achieve joint accurate detection of DE genes and between-sample normalization. To fit the model, we first eliminate sample-specific parameters using optimization argumentation to formulate the problem as a penalized linear regression problem, and then solve it with the alternating direction method of multipliers algorithm (ADMM), which is known for its fast convergence to modest accuracy [21]. Regarding the choice of tuning parameter, we theoretically derive the smallest tuning parameter α_max that leads to all-zero solution, and thereby find a proper tuning parameter within [0, α_max].

Note that our work is preceded by [22] which address the differential expression problem in a similar way. The difference is that the model of [22] considers only categorical or qualitative predictor/explanatory variables (treatment conditions). For example, label “0” is assigned to samples from the control group and label “1” to samples from the treatment group. While in our model, the predictor/explanatory variables can take arbitrary numeric values, and is thus a generalization of [22] from discrete to continuous predictorvariable model case. Note that the algorithm in [22] does not apply to the current numeric variable model at hand, because (i) applicability: it requires that multiple samples are present in each group but in the continuous-predictor model the concept of “group” no longer exists, or more precisely, each group is formed by only one sample; (ii) algorithmic complexity: it requires an p-dimensional exhaustive search, where p is the number of treatment conditions. When p > 1 (see Section 4), the algorithm is computationally very expensive.

The remainder of the paper is organized as follows. In Section 2, we formulate the problem in the context of a single treatment factor. In Section 3, we formulate the problem as a penalized simple regression problem and derive efficient ADMM algorithm to solve it, together with the estimation of noise variance and tuning parameter. In Section 4, we extend the simple regression model to multiple linear regression model. Comparison with existing methods is presented in Section 5, followed by discussions in Section 6.

2. DATA MODEL AND PROBLEM FORMULATION

Throughout the paper, the subscript and superscript are used to index the vectors for rows and columns of a matrix, respectively. For example, the i-th row and j-th column vector of a matrix A is denoted as a_i and a^j, respectively. Note that this does not conform to conventional notations where the subscript is used to index the columns of a matrix and the superscript is to index the rows.

2.1. Data model

Suppose there are a total of m genes measured in n samples. Let y_ij, i = 1, 2, …, m, j=1, 2, …, n, be the log-transformed gene expression measurements (a small positive number is usually added before taking logarithm) for the i-th gene from the j-th sample. The following statistical model is assumed

y_{i j} = β_{i 0} + β_{i} x_{j} + d_{j} + ε_{i j},

(2)

where β_i0 is the y-intercept for gene i, x_j, j = 1, 2, …, n, is the predictor variable that represents the treatment condition (e.g., drug dosage) for sample j, β_i is the slope or regression coefficient representing log-fold-change of expression levels of gene i with unit change of x_j, d_j is the scaling factor (e.g., log (sequencing depth) or log (library size)) for sample j for between-sample normalization [6], and $ε_{i j} ~ N (0, σ_{i}^{2})$ models the measurement noise. We assume that the error terms ε_ij are uncorrelated with the predictor variable and uncorrelated with each other (across both gene i and sample j).

In (2), we consider a single treatment condition. Extension to models with multiple treatment conditions will be discussed in Section 4.

Our main interest is to detect differentially expressed (DE) genes, i.e., whether β_i is equal to zero. If β_i ≠ 0 gene i is differentially expressed across the n samples; otherwise it is not.

Remark 2.1. Since β_i0 and d_j in (2) respectively model gene-specific factor (e.g., gene length) and sample-specific factor, model (2) is able to work with any log-transformed gene expression measures in the form of

y_{i j} = log \frac{c_{i j}}{l_{i} \cdot q_{j}},

(3)

where c_ij is the raw counts, ℓ_i is the length of gene i and q_j is the normalization factor of the j-th sample, since ℓ_i and q_j can be absorbed into β_i0 and d_j, respectively. Note that gene expression measures of form c_ij/(ℓ_i q_j) include the raw counts (with ℓ_i= q_{j =}1), measures based on between sample normalization only(ℓ_i=1) [6], and FPKM and TPM which are shown in (1) and involve both between and within-sample normalization.

2.2. Penalized likelihood

The likelihood function based on the measured data is given by

L (β_{0}, β, d; y) = \prod_{i = 1}^{m} \prod_{j = 1}^{n} \frac{1}{\sqrt{2 π σ_{i}^{2}}} exp {- \frac{{(y_{i j} - β_{i 0} - β_{i} x_{j} - d_{j})}^{2}}{2 σ_{i}^{2}}},

(4)

where

β = (\begin{matrix} β_{1} \\ β_{2} \\ ⋮ \\ β_{m} \end{matrix}) .

Assume that ${σ_{i}^{2}}_{i = 1}^{m}$ are known, maximization of (4) is equivalent to minimizing the negative log-likelihood:

l (β_{0}, β, d) = \sum_{i = 1}^{m} \sum_{j = 1}^{n} \frac{1}{2 σ_{i}^{2}} {(y_{i j} - β_{i 0} - β_{i} x_{j} - d_{j})}^{2},

(5)

where we have the irrelevant constant.

In practice, we solve for ${σ_{i}^{2}}_{i = 1}^{m}$ using an ad hoc approach, which will be described in Section 3.4.

We introduce a L1 penalty on the β_i′s,

p (β) = α ‖ β ‖_{1} : = α \sum_{i = 1}^{m} | β_{i} | .

(6)

It is well known that L1 the penalty favors sparse solutions (forces some coefficients to be exactly zero) [23]. This is reasonable since in practice many genes are not differentially expressed.

The objective function to be minimized is

f (β_{0}, β, d) = \sum_{i = 1}^{m} \sum_{j = 1}^{n} \frac{1}{2 σ_{i}^{2}} {(y_{i j} - β_{i 0} - x_{j} β_{i} - d_{j})}^{2} + α \sum_{i = 1}^{m} | β_{i} | .

(7)

3. ALGORITHM DEVELOPMENT

3.1. Formulation of (7) as Penalized Simple Linear Regression Model

It can be proved that the optimization problem in (7) is jointly convex in (β₀, β, d). Therefore, the minimizer of (7) is the stationary point.

The derivative of f (β₀, β, d with respect to d_j, j=1, 2, …, n, is

\frac{\partial f}{\partial d_{j}} = \sum_{i = 1}^{m} - \frac{1}{σ_{i}^{2}} (y_{i j} - β_{i 0} - x_{j} β_{i} - d_{j}) .

(8)

Setting to zero gives

d_{j} = \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} (y_{i j} - β_{i 0} - x_{j} β_{i}) .

(9)

Model (2) is non-identifiable because we can simply add any constant to all the d_j′s, and subtract the same constant from all the β_i0′s, while having the same fit. To resolve this issue, we fix d_{1 =} 0. Therefore

d_{j} = d_{j} - d_{1} = ({\bar{y}}_{\cdot j}^{(w)} - {\bar{y}}_{\cdot 1}^{(w)}) - (x_{j} - x_{1}) {\bar{β}}^{(w)},

(10)

where

{\bar{y}}_{\cdot j}^{(w)} : = \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} y_{i j}, for j = 1, 2, \dots, n,

(11)

{\bar{β}}^{(w)} : = \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} .

(12)

Here, the superscript ^(w) indicates that the mean is a weighted mean instead of an unweighted one.

On the other hand, from

\frac{\partial f}{\partial β_{i 0}} = - \frac{1}{σ_{i}^{2}} \sum_{j = 1}^{n} (y_{i j} - β_{i 0} - x_{j} β_{i} - d_{j}) = 0,

(13)

we have

β_{i 0} = \frac{1}{n} \sum_{j = 1}^{n} (y_{i j} - x_{j} β_{i} - d_{j}) = {\bar{y}}_{i .} - \bar{x} β_{i} - \frac{1}{n} \sum_{j = 1}^{n} d_{j},

(14)

where

{\bar{y}}_{i .} : = \frac{1}{n} \sum_{j = 1}^{n} y_{i j}, for i = 1, 2, \dots, m,

(15)

\bar{x} : = \frac{1}{n} \sum_{j = 1}^{n} x_{j} .

(16)

From (10) we have

\frac{1}{n} \sum_{j = 1}^{n} d_{j} = ({\bar{y}}^{(w)} - {\bar{y}}_{\cdot 1}^{(w)}) - (\bar{x} - x_{1}) {\bar{β}}^{(w)},

(17)

where

{\bar{y}}^{(w)} : = \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} (\frac{1}{σ_{i}^{2}} \cdot \frac{1}{n} \sum_{j = 1}^{n} y_{i j}) .

(18)

Substituting (17) into (14) yields

β_{i 0} = {\bar{y}}_{i .} + {\bar{y}}_{\cdot 1}^{(w)} - {\bar{y}}^{(w)} + (\bar{x} - x_{1}) {\bar{β}}^{(w)} - \bar{x} β_{i} .

(19)

Without loss of generality, we make the following two assumptions:

Assumption 3.1.

\sum_{j = 1}^{n} x_{j} = n \bar{x} = 0, \sum_{j = 1}^{n} x_{j}^{2} = 1.

(20)

These assumptions are reasonable since in the model (2) the center and scaling factor of x_j’s can be absorbed into β_i0 and β_i, respectively.

Then (19) simplifies to

β_{i 0} = {\bar{y}}_{i .} + {\bar{y}}_{\cdot 1}^{(w)} - {\bar{y}}^{(w)} - x_{1} {\bar{β}}^{(w)} .

(21)

The sum of (10) and (21) yields

β_{i 0} + d_{j} = {\bar{y}}_{i .} + {\bar{y}}_{\cdot j}^{(w)} - {\bar{y}}^{(w)} - x_{j} {\bar{β}}^{(w)} .

(22)

Substituting (22) into (7), the latter simplifies to

f (β) = \sum_{i = 1}^{m} \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - x_{j} β_{i} + x_{j} {\bar{β}}^{(w)})}^{2} + α \sum_{i = 1}^{m} | β_{i} |,

(23)

where

{\tilde{y}}_{i j} : = y_{i j} - {\bar{y}}_{i .} - {\bar{y}}_{\cdot j}^{(w)} + {\bar{y}}^{(w)} .

(24)

It can be shown by straightforward calculation that ${{\tilde{y}}_{i j}}$

\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} {\tilde{y}}_{i j} = 0,

(25)

\sum_{j = 1}^{n} {\tilde{y}}_{i j} = 0.

(26)

3.2. Model Fitting by ADMM

We propose to use the alternating direction method of multipliers (ADMM) [21] to solve (23). Although ADMM can be very slow to converge to high accuracy, it is often the case that ADMM converges to modest accuracy very fast (within a few tens of iterations) [21].

To apply the ADMM, the problem (23) is reformulated as

f (β) = \sum_{i = 1}^{m} \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - x_{j} β_{i} + x_{j} δ_{0})}^{2} + α \sum_{i = 1}^{m} | β_{i} |,

(27a)

subject to

\frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} = δ_{0} .

(27b)

The augmented Lagrangian of (27) is (28) at the bottom of the page.

Step 1: Update β_i, i =1, 2, …, m:

The derivative of (28) with respect to β_i is

\frac{\partial L_{ρ}}{\partial β_{i}} = \frac{1}{σ_{i}^{2}} \sum_{j = 1}^{n} - x_{j} ({\tilde{y}}_{i j} - x_{j} β_{i} + x_{j} δ_{0}) + α \partial | β_{i} | + \frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} \frac{1}{σ_{i}^{2}} λ + ρ \frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} \frac{1}{σ_{i}^{2}} (\frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} \sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}} β_{l} - δ_{0}),

(29)

where ∂ |β_i| is the subgradient of|β_i| with respect to β_i and is defined as

\partial | β_{i} | = {\begin{array}{l} 1, & β_{i} > 0 \\ - 1, & β_{i} < 0 \\ [- 1, 1], & β_{i} = 0 \end{array}

Setting (29) equal to zero gives (30) at the bottom of the page, where T is the soft-thresholding operator:

T_{σ_{i}^{2} α} [x] : = sign (x) {(| x | - σ_{i}^{2} α)}_{+} = {\begin{array}{l} x - σ_{i}^{2} α, & x > σ_{i}^{2} α \\ x + σ_{i}^{2} α, & x < - σ_{i}^{2} α \\ 0, & - σ_{i}^{2} α \leq x \leq σ_{i}^{2} α \end{array}

Step 2: Update δ₀:

The derivative of (28) with respect to δ₀ is

\frac{\partial L_{ρ}}{\partial δ_{0}} = \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} \sum_{j = 1}^{n} x_{j} ({\tilde{y}}_{i j} - x_{j} β_{i} + x_{j} δ_{0}) - λ + ρ (δ_{0} - \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i}) .

(31)

Setting (31) equal to zero gives

δ_{0} = \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} + ρ} (λ - \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} \sum_{j = 1}^{n} x_{j} {\tilde{y}}_{i j}) + \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} = \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} + ρ} λ + \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i},

(32)

where the second equality is due to (32) where the second equality is due to (25). Step 3: Update λ:

λ^{new} = λ^{old} + ρ (\frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} - δ_{0})

(33)

The model fitting algorithm is described in Algorithm 1.

3.3. Estimation of Tuning Parameter α

Eq. (23) can be expressed in matrix form as

f (β) = \frac{1}{2} {‖ Σ^{1 / 2} (\tilde{Y} - M β x^{T}) ‖}_{F}^{2} + α ‖ β ‖_{1},

(34)

where

Σ = diag {σ},

(35)

with

σ = {(\begin{matrix} 1 / σ_{1}^{2} & 1 / σ_{2}^{2} & \dots & 1 / σ_{m}^{2} \end{matrix})}^{T},

(36)

and

\begin{array}{l} M = (\begin{matrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ 0 & 0 & ⋱ & 0 \\ 0 & 0 & \dots & 1 \end{matrix}) - \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} (\begin{matrix} 1 / σ_{1}^{2} & 1 / σ_{2}^{2} & \dots & 1 / σ_{m}^{2} \\ 1 / σ_{1}^{2} & 1 / σ_{2}^{2} & \dots & 1 / σ_{m}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 1 / σ_{1}^{2} & 1 / σ_{2}^{2} & \dots & 1 / σ_{m}^{2} \end{matrix}) \end{array} = I_{m} - \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} 1_{m} σ^{T} .

(37)

After expansion, (34) becomes

f (β) = \frac{1}{2} {‖ Σ^{1 / 2} \tilde{Y} ‖}_{F}^{2} - β^{T} M^{T} Σ \tilde{Y} x + \frac{1}{2} β^{T} M^{T} Σ M β + α ‖ β ‖_{1},

(38)

where we exploit assumption x^Tx = 1.

Since $\frac{1}{2} β^{T} M^{T} Σ M β \geq 0$ with equality occurring at β = 0, it is shown that $\hat{β} = 0$ is the minimizer of f(β)when

α \geq {‖ M^{T} Σ \tilde{Y} x ‖}_{\infty} : = max_{1 \leq i \leq m} | m^{i^{T}} Σ \tilde{Y} x |,

(39)

where mⁱ denotes the i-th column of M in (37).

L_{ρ} (β, δ_{0}, λ) = \sum_{i = 1}^{m} \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - x_{j} β_{i} + x_{j} δ_{0})}^{2} + α \sum_{i = 1}^{m} | β_{i} | + λ (\frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} - δ_{0}) + \frac{ρ}{2} {(\frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} - δ_{0})}^{2}

(28)

β_{i} = \frac{σ_{i}^{2} {(\sum_{l = 1}^{m} σ_{l}^{- 2})}^{2}}{σ_{i}^{2} {(\sum_{l = 1}^{m} σ_{l}^{- 2})}^{2} + ρ} T_{σ_{i}^{2} α} [(\sum_{j = 1}^{n} x_{j} {\tilde{y}}_{i j} + δ_{0}) - \frac{ρ}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} (\frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} \sum_{l \neq i} \frac{1}{σ_{l}^{2}} β_{l} - δ_{0} + \frac{λ}{ρ})]

(30)

Note that

M^{T} Σ \tilde{Y} = (I_{m} - \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} σ 1_{m}^{T}) Σ \tilde{Y} = Σ \tilde{Y},

(40)

where the last equality holds because $1_{m}^{T} Σ \tilde{Y} = 0$ due to (25)

Substituting (40) into (39) yields

α_{max} = ‖ Σ \tilde{Y} x ‖_{\infty} = max_{i} | \frac{1}{σ_{i}^{2}} x^{T} {\tilde{y}}_{i} | .

(41)

Our strategy is to first sort $| \frac{1}{σ_{1}^{2}} x^{T} {\tilde{y}}_{1} |, | \frac{1}{σ_{2}^{2}} x^{T} {\tilde{y}}_{2} |, \dots, | \frac{1}{σ_{N}^{2}} x^{T} {\tilde{y}}_{m} |$ in ascending order, from least to greatest, and then set α as the P-th percentile (0 < P < 100) of the m ordered value. We set P=5 in Section 5.

3.4. Maximum likelihood estimation of ${σ_{i}^{2}}_{i = 1}^{m}$

To solve for ${σ_{i}^{2}}_{i = 1}^{m}$ , consider the negative log-likelihood function in (4) with ${σ_{i}^{2}}_{i = 1}^{m}$ being unknown parameters as well:

l (β_{0}, β, d, {σ_{i}^{2}}_{i = 1}^{m}) = \sum_{i = 1}^{m} [\frac{n}{2} log (2 π σ_{i}^{2}) + \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {(y_{i j} - β_{i 0} - x_{j} β_{i} - d_{j})}^{2}

(42)

Taking the partial derivatives of ℓ(^.) with respect to d_j and β_i0 and setting the results to zero, we arrive at (10) and (21) respectively. The sum of (10) and (21) gives (22).

Taking the partial derivative of ℓ(^.) with respect to β_i and setting the result to zero, we have

β_{i} = \sum_{j = 1}^{n} x_{j} y_{i j} - \sum_{j = 1}^{n} x_{j} (β_{i 0} + d_{j}) .

(43)

Substituting (22) into (43) yields

β_{i} = \sum_{j = 1}^{n} x_{j} y_{i j} - \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} \sum_{j = 1}^{n} x_{j} y_{i j} + {\bar{β}}^{(w)},

(44)

where $\bar{β} (w)$ is defined in (12).

Taking the partial derivative of $σ_{i}^{2}$ and setting the result to zero gives

σ_{i}^{2} = \frac{1}{n} \sum_{j = 1}^{n} {(y_{i j} - β_{i 0} - x_{j} β_{i} - d_{j})}^{2} .

(45)

Substituting (22) into (45) yields

σ_{i}^{2} = \frac{1}{n} \sum_{j = 1}^{n} {(y_{i j} - {\bar{y}}_{i .} - {\bar{y}}_{\cdot j}^{(w)} + {\bar{y}}^{(w)} - x_{j} β_{i} + x_{j} {\bar{β}}^{(w)})}^{2},

(46)

where ${\bar{y}}_{i \cdot}$ , ${\bar{y}}_{\cdot j}^{(w)}$ and ${\bar{y}}^{(w)}$ are defined in (15), (11) and (18), respectively.

Given initial estimates for ${\bar{β}}^{(w)}$ and ${σ_{i}^{2}}_{i = 1}^{m}$ we can alternate equations (44), (46) and (12) iteratively to gradually refine the estimates for β_i and $σ_{i}^{2}$ , as shown in Algorithm 2.

To obtain a robust estimate for $σ_{i}^{2}$ , we further take the weighted average of ${\hat{σ}}_{i}^{2}$ and the estimated mean variance across all the genes. That is

{\hat{σ}}_{i}^{' 2} = (1 - w) {\hat{σ}}_{i}^{2} + w \bar{{\hat{σ}}^{2}}

(47)

where

\bar{{\hat{σ}}^{2}} = \frac{1}{m} \sum_{i = 1}^{m} {\hat{σ}}_{i}^{2},

(48)

and the weight w is calculated using the following formula which is derived based on an empirical Bayes approach [24]

w = \frac{2 (m - 1)}{n + 1} (\frac{1}{m} + \frac{{(\bar{{\hat{σ}}^{2}})}^{2}}{\sum_{i = 1}^{m} {({\hat{σ}}_{i}^{2} - \bar{{\hat{σ}}^{2}})}^{2}}) .

(49)

This kind of variance estimation approach is widely used in differential gene expression analysis with small sample sizes [25], [26]. The estimated variances ${\hat{σ}}_{i}^{' 2}$ can then used in Algorithm 1 to solve for ${β_{i}}_{i = 1}^{m}$ Remark 3.1. In the special case of $σ_{1}^{2} = σ_{2}^{2} = \dots = σ_{m}^{2} = σ^{2}$ , it no longer requires to estimate σ² since the unknown σ² in (7) can be absorbed into the tuning parameter α.

4. EXTENSION TO MULTIPLE LINEAR REGRESSION MODEL AND ALGORITHM DEVELOPMENT

In the multiple linear regression model, each response or outcome is modeled by p > 1 predictors:

y_{i j} = β_{i 0} + β_{i}^{T} x_{j} + d_{j} + ε_{i j},

(50)

where

β_{i} = (\begin{matrix} β_{i 1} \\ β_{i 2} \\ ⋮ \\ β_{i p} \end{matrix}) \in ℝ^{p \times 1}

(51)

is a vector of regression coefficients representing log-fold-change of expression levels of gene i between treatment conditions, and

x_{j} = (\begin{matrix} x_{j 1} \\ x_{j 2} \\ ⋮ \\ x_{j p} \end{matrix}) \in ℝ^{p \times 1}

(52)

is a vector of predictors representing the treatment conditions (drug dosage, blood pressure, age, BMI, etc.) for sample j, and β_i0, d_j and $ε_{i j} ~ N (0, σ_{i}^{2})$ are the y-intercept, scaling factor for sample j and measurement noise, respectively. We assume that the error terms ε_ij are uncorrelated with all the predictor variables and uncorrelated with each other.

The likelihood function based on the observed data is given by

L (β_{0}, {β_{i}}_{i = 1}^{m}, d; Y) = \prod_{i = 1}^{m} \prod_{j = 1}^{n} \frac{1}{\sqrt{2 π σ_{i}^{2}}} exp {- \frac{{(y_{i j} - β_{i 0} - β_{i}^{T} x_{j} - d_{j})}^{2}}{2 σ_{i}^{2}}} .

(53)

Assume that ${σ_{i}^{2}}_{i = 1}^{m}$ are known, maximization of (53) leads to minimizing the negative log-likelihood:

l (β_{0}, {β_{i}}_{i = 1}^{m}, d) = \sum_{i = 1}^{m} \sum_{j = 1}^{n} \frac{1}{2 σ_{i}^{2}} {(y_{i j} - β_{i 0} - β_{i}^{T} x_{j} - d_{j})}^{2}

(54)

The objective function to be minimized is

f (β_{0}, {β_{i}}, d) = \sum_{i = 1}^{m} \sum_{j = 1}^{n} \frac{1}{2 σ_{i}^{2}} {(y_{i j} - β_{i 0} - x_{j}^{T} β_{i} - d_{j})}^{2} + \sum_{i = 1}^{m} p (β_{i}) .

(55)

Below we introduce two types of penalty function p(β_i).

Type I penalty:

p (β_{i}) = α | β_{i p} | .

(56)

Gene i is differentially expressed if β_ip ≠ 0 and not otherwise. This penalty is for the applications where one covariate is of main interest (e.g., treatment) while we want to adjust for all possible effects of other confounding covariates (e.g., age, gender, etc).

Type II penalty:

p (β_{i}) = α ‖ β_{i} ‖ .

(57)

Gene i is differentially expressed if β_i≠ 0 and not otherwise. This penalty is for the applications where all covariates are of interest and we want to identify the genes for which at least one covariate has an effect.

It can be proved that the optimization problem (55) with penalty (56) or (57) is jointly convex in (β₀, {β_i} ,d).

Assume that

\sum_{j = 1}^{n} x_{j} = 0,

(58)

and set d₁=0. using similar argumentation as in Section 3.1 to eliminate β₀ and d, we simplify (55) to

f ({β_{i}}) = \sum_{i = 1}^{m} \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - x_{j}^{T} β_{i} + x_{j}^{T} {\bar{β}}^{(w)})}^{2} + \sum_{i = 1}^{m} p (β_{i}),

(59)

where ${\tilde{y}}_{i j}$ is the same as that in (24), and

{\bar{β}}^{(w)} : = \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} .

(60)

4.1. Regression with type I penalty: Model fitting by ADMM

To apply the ADMM, we reformulate the Type I penalized regression problem as

f ({β_{i}}, δ_{0}) = \sum_{i = 1}^{m} \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - x_{j}^{T} β_{i} + x_{j}^{T} δ_{0})}^{2} + α \sum_{i = 1}^{m} | β_{i p} |,

(61a)

subject to

\frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} = δ_{0} .

(61b)

The augmented Lagrangian of (61) is (62) at the bottom of the page.

Step 1: Update β_i, i 1, 2, …, m:

Taking the partial derivative of (62) with respect to β_i and setting the result to zero gives

[\frac{1}{σ_{i}^{2}} X^{T} X + \frac{ρ}{σ_{i}^{4}} \frac{1}{{(\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}})}^{2}} I_{p}] β_{i} + α (\begin{matrix} 0 \\ ⋮ \\ 0 \\ \partial | β_{i p} | \end{matrix}) = v_{i},

(63)

where

X = (\begin{matrix} x_{1}^{T} \\ x_{2}^{T} \\ ⋮ \\ x_{n}^{T} \end{matrix}) = (\begin{matrix} x_{11} & x_{12} & \dots & x_{1 p} \\ x_{21} & x_{22} & \dots & x_{2 p} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{n 1} & x_{n 2} & \dots & x_{n p} \end{matrix}) \in ℝ^{n \times p},

(64)

∂|β_ip| is the subgradient of |β_ip| with respect to β_ip,a nd

v_{i} = \frac{1}{σ_{i}^{2}} (\sum_{j = 1}^{n} x_{j} {\tilde{y}}_{i j} + X^{T} X δ_{0}) - \frac{ρ}{σ_{i}^{2}} \frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} (\frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} \sum_{l \neq i} \frac{1}{σ_{l}^{2}} β_{l} - δ_{0} + \frac{λ}{ρ}) .

(65)

Given matrix partition in the following form:

\frac{1}{σ_{i}^{2}} X^{T} X + \frac{ρ}{σ_{i}^{4}} \frac{1}{{(\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}})}^{2}} I_{p} = Q = (\begin{matrix} Q_{11} & q \\ q^{T} & q_{p p} \end{matrix}), β_{i} = (\begin{matrix} β_{i}^{-} \\ β_{i p} \end{matrix}), v_{i} = (\begin{matrix} v_{i}^{-} \\ v_{i p} \end{matrix}),

where Q₁₁ is the submatrix‚ of Q „with last‚ row and last column deleted, from (63) we have

Q_{11} β_{i}^{-} + q β_{i p} = v_{i}^{-}

(66)

q^{T} β_{i}^{-} + q_{p p} β_{i p} + α \partial | β_{i p} | = v_{i p} .

(67)

From (66) it follows

β_{i}^{-} = Q_{11}^{- 1} (v_{i}^{-} - q β_{i p}) .

(68)

Substituting (68) into (67) yields

β_{i p} = \frac{1}{q_{p p} - q^{T} Q_{11}^{- 1} q} T_{α} [v_{i p} - q^{T} Q_{11}^{- 1} v_{i}^{-}] .

(69)

Step 2: Update δ₀:

Taking the derivative of (62) with respect to δ₀ and setting the result to zero gives

δ_{0} = {(\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} X^{T} X + ρ I_{p})}^{- 1} λ + \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i},

(70)

where we have exploited (25).

Step 3: Update λ:

λ^{new} = λ^{old} + ρ (\frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} - δ_{0}) .

(71)

The model fitting algorithm is described in Algorithm 3.

4.2. Regression with type II penalty: Model fitting by ADMM

The Type II penalized regression problem is reformulated as

f ({β_{i}}, δ_{0}) = \sum_{i = 1}^{m} \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - x_{j}^{T} β_{i} + x_{j}^{T} δ_{0})}^{2} + α \sum_{i = 1}^{m} ‖ β_{i} ‖,

(72a)

subject to

\frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} = δ_{0} .

(72b)

The augmented Lagrangian of (72) is (73) at the bottom of the page.

Step 1: Update β_i, i 1, 2, …, m:

The relevant terms to compute the derivatives of (73) with respect to β_i is (74) at the bottom of the page, where c is an irrelevant constant which does not depend on β_i, and v_i is defined in (65).

L_{ρ} ({β_{i}}, δ_{0}, λ) = \sum_{i = 1}^{m} \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - x_{j}^{T} β_{i} + x_{j}^{T} δ_{0})}^{2} + α \sum_{i = 1}^{m} | β_{i p} | + λ^{T} (\frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} - δ_{0}) + \frac{ρ}{2} {‖ \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} - δ_{0} ‖}^{2} .

(62)

It can be shown that when ‖υ_i‖ ≤ α then β_{i =} 0; otherwise denote the eigendecomposition of X^TX as X^TX=UDU^T, we have that minimization of (74) is equivalent to

min_{β_{i}} \frac{1}{2} {‖ Z_{i} β_{i} - b_{i} ‖}^{2} + α ‖ β_{i} ‖,

(75a)

where

Z_{i} = {[\frac{1}{σ_{i}^{2}} D + \frac{ρ}{σ_{i}^{4}} \frac{1}{{(\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}})}^{2}} I_{p}]}^{1 / 2} U^{T},

(75b)

b_{i} = {[\frac{1}{σ_{i}^{2}} D + \frac{ρ}{σ_{i}^{4}} \frac{1}{{(\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}})}^{2}} I_{p}]}^{- 1 / 2} U^{T} v_{i} .

(75c)

As in [27], we use a coordinate descent procedure to optimize (75). For each s, given the estimate of ${{\hat{β}}_{i l}}_{l \neq s}$ can be estimated by solving

min_{β_{i s}} \frac{1}{2} {‖ z^{s} β_{i s} - r_{i}^{(s)} ‖}^{2} + α \sqrt{β_{i s}^{2} + \sum_{l \neq s} {\hat{β}}_{i l}^{2}},

(76)

where

r_{i}^{(s)} = b_{i} - \sum_{l \neq s} z^{l} {\hat{β}}_{i l} .

(77)

We solve (76) via a one-dimensional search. Note that the solution to (76) falls between 0 and $β_{i l}^{o} = z^{s T} r_{i}^{(s)} / {‖ z^{s} ‖}^{2}$ , the ordinary least-squares estimate. We can use the optimize function in the R package, or fminbnd function in MAT-LAB, which performs one-dimensional search based on golden section search and successive parabolic interpolation.

After updating ${β_{i}}_{i = 1}^{m}$ , the updates of δ₀ and λ out to be the same as that in Section 4.1. The model fitting algorithm is described in Algorithm 4.

4.3. Estimation of Tuning Parameter α

Eq. (59) can be expressed in matrix form as

f (B) = \frac{1}{2} {‖ Σ^{1 / 2} (\tilde{Y} - M B X^{T}) ‖}_{F}^{2} + p (B),

(78)

where M and X are respectively defined in (37) and (64), and

B = (\begin{matrix} β_{1}^{T} \\ β_{2}^{T} \\ ⋮ \\ β_{m}^{T} \end{matrix}) = (\begin{matrix} β_{11} & β_{12} & \dots & β_{1 p} \\ β_{21} & β_{22} & \dots & β_{2 p} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ β_{m 1} & β_{m 2} & \dots & β_{m p} \end{matrix}) \in ℝ^{m \times p},

(79)

and p(B)is the penalty function.

The derivative of f(B)with respect to B is

\frac{\partial f}{\partial B} = M^{T} Σ M B X^{T} X - M^{T} Σ \tilde{Y} X + \frac{\partial p (B)}{\partial B} .

(80)

4.3.1. Type I Penalty

When p (B) $p (B) = α \sum_{i = 1}^{m} | β_{i p} |$ , its derivative with respect to B is

\frac{\partial p (B)}{\partial B} = α (\begin{matrix} 0 & \dots & 0 & \partial | β_{1 p} | \\ 0 & \dots & 0 & \partial | β_{2 p} | \\ ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & \dots & 0 & \partial | β_{m p} | \end{matrix}) = (\begin{matrix} 0_{m \times (p - 1)} & α \frac{\partial {‖ β^{p} ‖}_{1}}{\partial β^{p}} \end{matrix}) .

(81)

Denote

\begin{array}{l} X = [\begin{matrix} x^{1} & \dots & x^{p - 1} & x^{p} \end{matrix}] = [\begin{matrix} X_{1} & x^{p} \end{matrix}], \\ B = [\begin{array}{l} β^{1} & \dots & β^{p - 1} & β^{p} \end{array}] = [\begin{array}{l} B_{1} & β^{p} \end{array}] . \end{array}

Setting (80) equal to zero gives

M^{T} Σ M (B_{1} X_{1}^{T} + β^{p} x^{p T}) X_{1} = M^{T} Σ \tilde{Y} X_{1}

(82)

M^{T} Σ M (B_{1} X_{1}^{T} + β^{p} x^{p T}) x^{p} + α \frac{\partial {‖ β^{p} ‖}_{1}}{\partial β^{p}} = M^{T} Σ \tilde{Y} x^{p} .

(83)

Since M^TΣM is rank deficient², the solution to (82) is not unique. We apply the pseudoinverse of M^TΣM to obtain the minimum-norm solution to (82):

B_{1} = {(M^{T} Σ M)}^{†} (M^{T} Σ \tilde{Y} - M^{T} Σ M β^{p} x^{p T}) X_{1} {(X_{1}^{T} X_{1})}^{- 1} .

(84)

Substituting (84) into (83) yields

M^{T} Σ M β^{p} x^{p T} [I_{n} - X_{1} {(X_{1}^{T} X_{1})}^{- 1} X_{1}^{T}] x^{p} + α \frac{\partial {‖ β^{p} ‖}_{1}}{\partial β^{p}} = M^{T} Σ \tilde{Y} [I_{n} - X_{1} {(X_{1}^{T} X_{1})}^{- 1} X_{1}^{T}] x^{p} .

(85)

2. Simple analysis shows that the rank of M^TΣM is m-1.

L_{ρ} ({β_{i}}, δ_{0}, λ) = \sum_{i = 1}^{m} \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - x_{j}^{T} β_{i} + x_{j}^{T} δ_{0})}^{2} + α \sum_{i = 1}^{m} ‖ β_{i} ‖ + λ^{T} (\frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} - δ_{0}) + \frac{ρ}{2} {‖ \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} β_{i} - δ_{0} ‖}^{2}

(73)

L_{i} ({β_{i}}, δ_{0}, λ) = \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {({\tilde{y}}_{i j} - x_{j}^{T} β_{i} + x_{j}^{T} δ_{0})}^{2} + α ‖ β_{i} ‖ + λ^{T} \frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} \frac{1}{σ_{i}^{2}} β_{i} + \frac{ρ}{2} {‖ \frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} \frac{1}{σ_{i}^{2}} β_{i} + \frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} \sum_{l \neq i} \frac{1}{σ_{l}^{2}} β_{l} - δ_{0} ‖}^{2} = \frac{1}{2} β_{i}^{T} (\frac{1}{σ_{i}^{2}} X^{T} X + \frac{ρ}{σ_{i}^{4}} \frac{1}{{(\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}})}^{2}} I_{p}) β_{i} - β_{i}^{T} v_{i} + α ‖ β_{i} ‖ + c

(74)

Note that to arrive at (85), we have exploited the fact that (M^TΣM)(M^TΣM)^†M^TΣ=M^TΣ which is due to that M^TΣM=M^TΣ according to the the definition of M in (37) and definition of the pseudoinverse of a matrix.

Since the coefficient matrix of β^p, i.e., $M^{T} Σ M (x^{p T} [I_{n} - X_{1} {(X_{1}^{T} X_{1})}^{- 1} X_{1}^{T}] x^{p})$ is positive semidefinite, (85) implies that when

α \geq ‖ M^{T} Σ \tilde{Y} [I_{n} - X_{1} {(X_{1}^{T} X_{1})}^{- 1} X_{1}^{T}] x^{p} ‖_{\infty} = {‖ Σ \tilde{Y} [I_{n} - X_{1} {(X_{1}^{T} X_{1})}^{- 1} X_{1}^{T}] x^{p} ‖}_{\infty} = max_{i} | \frac{1}{σ_{i}^{2}} {\tilde{y}}_{i}^{T} [I_{n} - X_{1} {(X_{1}^{T} X_{1})}^{- 1} X_{1}^{T}] x^{p} |,

(86)

where the next to last equality is due to (40) we obtain zero solution.

4.3.2. Type II Penalty

The derivative of $p (B) = α \sum_{i = 1}^{m} ‖ β_{i} ‖$ with respect to B is

\frac{\partial p (B)}{\partial B} = α (\begin{matrix} \frac{\partial ‖ β_{1} ‖}{\partial β_{1}^{T}} \\ \frac{\partial ‖ β_{2} ‖}{\partial β_{2}^{T}} \\ ⋮ \\ \frac{\partial ‖ β_{m} ‖}{\partial β_{m}^{T}} \end{matrix}),

(87)

when $\frac{\partial ‖ β_{i} ‖}{\partial β_{i}} = \frac{β_{i}}{‖ β_{i} ‖}$ if β_i ≠ 0 and $‖ \frac{\partial ‖ β_{i} ‖}{\partial β_{i}} ‖ \leq 1$ otherwise[27],[28]

Setting (80) equal to zero yields

X^{T} X B^{T} M^{T} Σ m^{i} - X^{T} {\tilde{Y}}^{T} Σ m^{i} + α \frac{\partial ‖ β_{i} ‖}{\partial β_{i}} = 0_{p \times 1},

(88)

for i=1, 2, …, m, where mⁱ is the i-th column of M in (37) The minimizer to f (B) is a zero matrix when

α \geq max_{i} ‖ X^{T} {\tilde{Y}}^{T} Σ m^{i} ‖ .

(89)

Note that

{\tilde{Y}}^{T} Σ m^{i} = {\tilde{Y}}^{T} Σ (e_{i} - \frac{1}{\sum_{l = 1}^{m} \frac{1}{σ_{l}^{2}}} \frac{1}{σ_{i}^{2}} 1_{m}) = {\tilde{Y}}^{T} Σ e_{i} = \frac{1}{σ_{i}^{2}} {\tilde{y}}_{i},

(90)

where the next to last equality is due to (25) Substituting (90) into (89) yields

α_{max} = max_{i} ‖ \frac{1}{σ_{i}^{2}} X^{T} {\tilde{y}}_{i} ‖ .

(91)

4.4. Maximum likelihood estimation of ${σ_{i}^{2}}_{i = 1}^{m}$

To solve for ${σ_{i}^{2}}_{i = 1}^{m}$ consider the negative log-likelihood function with ${σ_{i}^{2}}_{i = 1}^{m}$ being unknown parameters as well:

l (β_{0}, {β_{i}}_{i = 1}^{m}, d, {σ_{i}^{2}}_{i = 1}^{m}) = \sum_{i = 1}^{m} [\frac{n}{2} log (2 π σ_{i}^{2}) + \frac{1}{2 σ_{i}^{2}} \sum_{j = 1}^{n} {(y_{i j} - β_{i 0} - x_{j}^{T} β_{i} - d_{j})}^{2}] .

(92)

Taking the partial derivatives of ℓ(^.)with respect to d_j and β_i0 and setting the result to zero, we arrive at

d_{j} = d_{j} - d_{1} = ({\bar{y}}_{\cdot j}^{(w)} - {\bar{y}}_{.1}^{(w)}) - {(x_{j} - x_{1})}^{T} {\bar{β}}^{(w)},

(93)

β_{i 0} = {\bar{y}}_{i .} - \frac{1}{n} \sum_{j = 1}^{n} x_{j}^{T} β_{i} - \frac{1}{n} \sum_{j = 1}^{n} d_{j} = {\bar{y}}_{i .} + {\bar{y}}_{\cdot 1}^{(w)} - {\bar{y}}^{(w)} - x_{1}^{T} {\bar{β}}^{(w)},

(94)

where to derive the second equality we have exploited assumption (58).

The sum of (93) and (94) gives

β_{i 0} + d_{j} = {\bar{y}}_{i .} + {\bar{y}}_{\cdot j}^{(w)} - {\bar{y}}^{(w)} - x_{j}^{T} {\bar{β}}^{(w)} .

(95)

Taking the partial derivative of ℓ(^.) with respect to β_i and setting the result to zero, we have

β_{i} = \sum_{j = 1}^{n} x_{j} y_{i j} - \sum_{j = 1}^{n} x_{j} (β_{i 0} + d_{j}) .

(96)

Substituting (95) into (96) yields

β_{i} = {(X^{T} X)}^{- 1} [\sum_{j = 1}^{n} x_{j} y_{i j} - \frac{1}{\sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}}} \sum_{i = 1}^{m} \frac{1}{σ_{i}^{2}} \sum_{j = 1}^{n} x_{j} y_{i j}] + {\bar{β}}^{(w)},

(97)

where β^(w) is defined in (60).

Taking the partial derivative of $σ_{i}^{2}$ and setting the result to zero gives

σ_{i}^{2} = \frac{1}{n} \sum_{j = 1}^{n} {(y_{i j} - β_{i 0} - x_{j}^{T} β_{i} - d_{j})}^{2} .

(98)

Substituting (95) into (98) yields

σ_{i}^{2} = \frac{1}{n} \sum_{j = 1}^{n} {(y_{i j} - {\bar{y}}_{i .} - {\bar{y}}_{\cdot j}^{(w)} + {\bar{y}}^{(w)} - x_{j}^{T} β_{i} + x_{j}^{T} {\bar{β}}^{(w)})}^{2},

(99)

where ${\bar{y}}_{i \cdot}$ , ${\bar{y}}_{\cdot j}^{(w)}$ and ${\bar{y}}^{(w)}$ are defined in (15), (11) and (18), respectively.

Given initial estimates for ${\bar{β}}^{(w)}$ and ${σ_{i}^{2}}_{i = 1}^{m}$ estimates for β_i and $σ_{i}^{2}$ can then be iteratively updated using equations (97), (99), and (60) and until convergence.

After estimating $σ_{i}^{2}' s$ s, they can then be shrinked (squeezed) toward the common noise variance to obtain robust estimates for $σ_{i}^{2}$ , as done in Section 3.4.

Given initial estimates for ${\bar{β}}^{(w)}$ and ${σ_{i}^{2}}_{i = 1}^{m}$ , estimates for β_i and $σ_{i}^{2}$ can then be iteratively updated using equations (97), (99), (60) until convergence, as shown in Algorithm 5.

5. EXPERIMENTS

We evaluate the performance of the proposed algorithm (referred to as ELMSeq, short for extended linear model for RNA-seq data analysis). To save space, we only verify the proposed algorithm the for simple regression model (2). We use the 5^th percentile to set the tuning parameter α (see Section 3.3).

We compare our method with the state-of-the-art methods for detecting differential gene expression from RNA-seq data: edgeR-robust [20], [29], DESeq2 [18], and limma-voom [16], [17].

5.1. Simulations on Synthetic Data

We simulate RNA-seq data with a total of m 1000 genes and n = 15 samples. The data generation is described in Table 1.

Table 1:

Synthetic data generation process and parameters

ℓ_i ~ 2^unif(5,10)	gene length of gene i
$β_{i 0} ~ N (0, 1)$	other log scaling factors of gene i
β_i = 0	log-fold change for non-DE genes
$β_{i} ~ N (2, 1)$	log-fold change for up-regulated DE genes
$β_{i} ~ N (- 2, 1)$	log-fold change for down-regulated DE genes
$x_{j} ~ N (0, 1)$	condition data of sample j
N_j ~ unif(2, 3) × 10⁶	library size of sample j
$d_{j} ~ N (0, 1)$	other log scaling factors of sample j
$μ_{i j} = N_{j} \frac{l_{i}}{\sum_{i = 1}^{m} l_{i}} e^{β_{i 0} + β_{i} x_{j} + d_{j}}$	expected RNA-seq read counts of gene i from sample j
$c_{i j} = ⌈ e^{log N (μ_{i j}, 0.1)} ⌉$	read counts
y_ij = log c_ij	log-transformed gene expression

Open in a new tab

We first examine whether the proposed algorithm can accurately estimate the log-fold changes (or slopes) β_i′s. For ease of illustration, we set the true slopes for DE ones as β_i= ±2 instead of $β_{i} ~ N (\pm 2, 1)$ We start with 300 DE and 700 non-DE genes. Among DE genes 50% are up-regulated while the remaining 50% are down-regulated. The fitted ${β_{i}}_{i = 1}^{m}$ using ELMSeq are plotted in Figure 1(a). We see that the estimated slopes are centered around the true ones: the estimated β_i′s of the DE genes are centered around ±2, while those of the non-DE genes are close to zero. In Figure 1(b) and Figure 1(c), we increase the percent of up-regulated DE genes to 70% and 90%, respectively. Our method still accurately retrieves all non-zero β_i′s while shrinking all other β_i′s to zero.

In Figure 1(d–f), we increase the number of DE genes to 500, among which 50%, 70% or 90% are up-regulated while others are down-regulated. Our method still achieves accurate estimates. In Figure 1(g–h), we further increase the number of DE genes to 700 among which 50% or 70% are up-regulated, for which our method still achieves accurate estimates when. Only when we simulate with 700 DE genes among which 90% are up-regulated, our method fails to distinguish between DE and non-DE genes since the estimated regression coefficients of the latter are not zero either [Figure 1(i)]. A theoretical explanation of Figure 1(i) has been provided in the supplementary material.

Using a different gene expression measure such as CPM, RPKM or TPM values computed with formulas in (1) yields essentially the same result.

Using the algorithm in Algorithm (1), we estimate the regression coefficient ${\hat{β}}_{i}$ for each gene i. We decide there is a linear relationship between the predictor variable x_j and the expression data y_ij if ${\hat{β}}_{i} \neq 0$ The larger $| {\hat{β}}_{i} |$ is, the stronger the relationship. We then sort the genes in descending order of their $| {\hat{β}}_{i} |$ vary the threshold to construct the receiver operating characteristic (ROC) curve and to calculate the area under the ROC curve (AUC).

The AUCs for DE gene detection using all four methods are summarized in Table 2. We see that the ELMSeq performs better than or comparbly to other three methods, regardless of how many genes are differentially expressed and whether they are expressed in a symmetric manner or not. In challenging cases where a large proportion of genes are differentially expressed in an asymmetric manner (e.g., 50% DE genes among which 90% are up-regulated or 70% DE genes among which 70% are up-regulated), the performance gain of the ELMSeq over completing methods is more significant.

Table 2:

AUC comparison of edgeR-robust, DESeq2, limma voom and ELMSeq in log-normally distributed data. Number of samples: n= 15, log-fold change for DE genes: $β_{i} ~ N (\pm 2, 1)$ , and noise level: σ_i= 0.1. The table shows the percent of DE genes (DE %), percent of up-regulated genes among the DE genes (Up %), as well as the mean AUCs for all four methods measured using 10 simulated replicates. The standard errors of the mean AUCs are given in parentheses.

DE (%)	Up (%)	edgeR	DESeq2	voom	ELMSeq
10	50	0.9903 (0.0016)	0.6068 (0.0807)	0.991 (0.0018)	0.9914 (0.0017)
10	70	0.9935 (0.0021)	0.4527 (0.0638)	0.9941 (0.0021)	0.9943 (0.0021)
10	90	0.9869 (0.0028)	0.6878 (0.0637)	0.9875 (0.0024)	0.9897 (0.0022)
30	50	0.9898 (0.001)	0.5508 (0.0883)	0.99 (0.001)	0.99 (0.001)
30	70	0.9891 (0.0014)	0.7946 (0.064)	0.9897 (0.0014)	0.991 (0.0011)
30	90	0.9788 (0.0023)	0.6114 (0.0805)	0.9796 (0.0022)	0.9795 (0.0014)
50	50	0.9917 (8e-04)	0.429 (0.0797)	0.9916 (8e-04)	0.9917 (8e-04)
50	70	0.9748 (0.0026)	0.4923 (0.081)	0.9754 (0.0026)	0.9826 (0.0015)
50	90	0.8717 (0.0133)	0.4697 (0.0667)	0.8801 (0.0119)	0.9662 (0.002)
70	50	0.9907 (9e-04)	0.5572 (0.1027)	0.9915 (8e-04)	0.9923 (7e-04)
70	70	0.8564 (0.018)	0.5307 (0.0588)	0.8696 (0.0148)	0.9591 (0.0034)
70	90	0.3375 (0.0108)	0.4808 (0.0192)	0.3204 (0.0154)	0.4718 (0.0124)

Open in a new tab

In Table 3, we decrease the log-fold change of the DE genes as $β_{i} ~ N (\pm 0.2, 0.1)$ while keeping all other data generation parameters (including the noise level) the same as those in Table 2. We see that all methods suffer a degradation in AUC performance; but again, the ELMSeq consistently perform better than or comparably to all other methods.

Table 3:

AUC comparison of edgeR-robust, DESeq2, limma voom and ELMSeq in log-normally distributed data. The data generation parameters are the same as those in Table 2 except that the log-fold changes for DE genes decrease to: $β_{i} ~ N (\pm 0.2, 0.1)$ .

DE (%)	Up (%)	edge	DESeq2	voom	ELMSeq
10	50	0.8055 (0.0089)	0.5241 (0.0142)	0.8224 (0.0095)	0.8232 (0.0095)
10	70	0.8086 (0.009)	0.4846 (0.0126)	0.8212 (0.0095)	0.8234 (0.0101)
10	90	0.7867 (0.0084)	0.5078 (0.0084)	0.7955 (0.0104)	0.8024 (0.0106)
30	50	0.8087 (0.005)	0.497 (0.0119)	0.8158 (0.0054)	0.8157 (0.0054)
30	70	0.7848 (0.0052)	0.5471 (0.0211)	0.7949 (0.0052)	0.8013 (0.0055)
30	90	0.7398 (0.0059)	0.5329 (0.0181)	0.7505 (0.0059)	0.773 (0.0054)
50	50	0.8143 (0.0061)	0.4931 (0.0137)	0.8265 (0.0049)	0.8268 (0.0051)
50	70	0.7611 (0.0054)	0.5061 (0.0155)	0.7704 (0.0054)	0.7752 (0.0056)
50	90	0.6451 (0.006)	0.5017 (0.0102)	0.6503 (0.0059)	0.6793 (0.0025)
70	50	0.8149 (0.0022)	0.5231 (0.0273)	0.8261 (0.003)	0.8267 (0.0028)
70	70	0.7271 (0.0074)	0.5093 (0.01)	0.7354 (0.0086)	0.7388 (0.0083)
70	90	0.5449 (0.0066)	0.5158 (0.0089)	0.5505 (0.0081)	0.5434 (0.0069)

Open in a new tab

Note that when more samples are available, the performance gain of the ELMSeq over completing methods becomes even more significant. The results for various sample sizes n=5, 8, 25, 50, 100 are provided in the supplementary materials (Tables S1–S5 for genes with high expression profiles $β_{i} ~ N (\pm 2, 1)$ and Tables S6–S10 for genes with low expression profiles $β_{i} ~ N (\pm 0.2, 0.1)$ ).

We also performed simulations with the multiple linear regression model in Section 4, and the preliminary results are similar to that obtained for the simple regression model. Note that unlike the simple regression model and type I penalized multiple linear regression model, the type II penalized multiple linear regression model does not allow to define up-and down-regulated genes as multiple regression coefficients are tested simultaneously.

5.2. An application to a real RNA-Seq dataset

We further evaluate our algorithm on a prostate adenocarcinoma (PRAD) RNA-Sequencing dataset published as part of The Cancer Genome Atlas (TCGA) project [30]. The RNA-Seq datasets of 20531 genes from 187 samples were downloaded from the TCGA data portal (https://tcga-data.nci.nih.gov/tcga). We desire to identify genes that are associated with pre-operative prostate-specific antigen (PSA), an important risk factor for prostate cancer. The gene expression data were preprocessed by the TCGA consortium. Tissue samples from 333 PRAD patients were sequenced using the Il-lumina sequencing instruments. The raw sequencing reads were processed and analyzed using the SeqWare Pipeline 0.7.0 and MapspliceRSEM workflow 0.7 developed by the University of North Carolina, and then aligned to the human reference genome using MapSplice[31]. The gene expression distributions of all samples are normalized to have the same 75th percentile expression values (1,000).

Using the algorithm in Algorithm 1, we obtain the estimated between-sample normalization factors ${\hat{d}}_{j}' s$ and regression coefficient ${\hat{β}}_{i}$ for each gene i. We then substitute ${\hat{d}}_{j}' s$ into model (2), and for each gene i compute the p-value by testing the null hypothesis that the slope of the regression line is equal to zero, i.e., β_{i =} 0. We determine a gene is differentially expressed if the p value associated with its linear regression model is less than 0.05/m. Here the threshold 0.05/m is determined using the Bonferroni correction to adjust for multiple significant tests and to achieve a desired family-wise error rate of 0.05. The relations between the sets of differentially expressed genes selected by edgeR, DESeq2, limma-voom and ELMSeq are depicted in Fig. 2.

Figure 2: — Venn diagram showing the relation between the set of differentially expressed genes detected by edgeR, DESeq2, limma-voom and ELMSeq.

Nine genes are uniquely detected by ELMSeq: RIC3, ALDH1A2, BCL11A, CDH3, DIRAS3, EPHA5, CEACAM1, PRSS16, and AJAP1. For most of these genes, evidence has also been reported in the literature on their association with prostate cancer. For example, the genes ALDH1a2 [32] and CEACAM1 [33] are reported to be tumor suppressors in prostate cancer: underexpression of these genes promote prostate cancer cell proliferation.

Twelve genes are detected by all four methods: KANK4, RHOU, TPT1, SH2D3A, EEF1A1P9, ZCWPW1, ZNF454, RACGAP1, PTPLA, POC1A, AURKA and TIMM17A. The common genes detected by three methods are: six genes CDK1, FAM111B, MLF1IP, PRC1, DTL, RAD54B by edgeR, DESeq2, and limma-voom, three genes SH3RF2, ATCAY and PCP4 by edgeR, DESeq2 and ELMSeq, three genes FERMT1, FOXA3 and LRAT by edgeR, limma-voom and ELMSeq, and one gene IPO9 by DESeq2, limma-voom and ELMSeq. For most of these genes, evidence has also been reported in the literature on their association with prostate cancer. For example, the silencing of gene RHOU decreases the invasion, proliferation and motility of prostate cancer cells[34].

6. DISCUSSION

A unified statistical model is proposed for joint between sample normalization and DE detection of RNA-seq data. The sample-specific normalization factors are modeled as unknown parameters and jointly estimated together with DE detection. As a result, the model is robust against normalization errors and is independent of the units (i.e., counts, CPM/RPM, RPKM/FPKM or TPM) in which gene expression levels are summarized.

For the model with a single treatment condition, we introduce the L1 penalty to the linear regression model. The L1 penalty favors sparse solutions (forces some coefficients to be exactly zero). This is desirable since many genes are not differentially expressed. From a Bayesian point of view, the lasso penalty corresponds to a Laplace (double exponential centred at zero) prior over the regression coefficients. By contrast, existing methods do not exploit the sparsity information. We also extend the simple linear regression model to multiple linear regression model to accommodate multiple treatment conditions. Two types of penalty functions are introduced. In the first one only one covariate is of interest while all other covariates are treated as confounding factors. We are interested in testing whether that specific covariate is associated with differential expression. In the second case all covariates are of interest (there are no confounding covariates) and we are interested in testing whether any covariate affects the differential expression of a gene.

Simulation studies show that the proposed methods always perform better than or comparably to existing methods in terms of AUC. The performance gain increases with a larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.

The R codes of the algorithms described in the paper are available for download at http://www-personal.umich. edu/~jianghui/lr-ADMM/.

Supplementary Material

Supplemental materials

NIHMS1043996-supplement-Supplemental_materials.pdf^{(109.7KB, pdf)}

ACKNOWLEDGMENTS

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper.

Biographies

graphic file with name nihms-1043996-b0003.gif

Kefei Liu received B.Sc. in mathematics and Ph.D. in Electronic Engineering from Wuhan University in 2006 and City University of Hong Kong in 2013, respectively. He is currently a Postdoctoral Research Fellow at the Center for Computational Biology and Bioinformatics, Indiana University School of Medicine. Before joining IU, he worked as a Postdoctoral Research Associate at The Biodesign Institute of Arizona State University and the Department of Computational Medicine and Bioinformatics of University of Michigan. His current research interests include machine learning, optimization, tensor decompositions and their applications in biomedical data analysis.

graphic file with name nihms-1043996-b0004.gif

Jieping Ye received the Ph.D. degree in computer science from the University of Minnesota, Twin Cities, MN, USA, in 2005.

He is an Associate Professor of Department of Computational Medicine and Bioinformatics and Department of Electrical Engineering and Computer Science at the University of Michigan, Ann Arbor, MI, USA. His research interests include machine learning, data mining, and biomedical informatics. Dr. Ye has served as Senior Program Committee/Area Chair/Program Committee Vice Chair of many conferences including NIPS, ICML, KDD, IJCAI, ICDM, SDM, ACML, and PAKDD. He serves as a PC Co-Chair of SDM 2015. He serves as an Associate Editor for IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING and IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, and serves as an Action Editor for Data Mining and Knowledge Discovery. He won the NSF CAREER Award in 2010. His papers have been selected for the outstanding student paper at ICML in 2004, the KDD best research paper honorable mention in 2010, the KDD best research paper nomination in 2011 and 2012, the SDM best research paper runner up in 2013, the KDD best research paper runner up in 2013, and the KDD best student paper award in 2014.

graphic file with name nihms-1043996-b0005.gif

Yang Yang is a Ph.D. candidate at Beihang University. Now he is a visiting student under the supervision of distinguished professor Philip S. Yu at the University of Illinois at Chicago. He got his bachelor’s and master’s degree from Xidian University. His research interests are social network analysis, machine learning, and complex networks.

graphic file with name nihms-1043996-b0006.gif

Li Shen holds a B.S. degree from Xi’an Jiao Tong University, an M.S. degree from Shanghai Jiao Tong University, and a Ph.D. degree from Dartmouth College, all in Computer Science. He is an Associate Professor of Radiology and Imaging Sciences at Indiana University School of Medicine. His research interests include medical image computing, bioinformatics, data mining, network science, systems biology, brain imaging genomics, and brain connectomics.

graphic file with name nihms-1043996-b0007.gif

Hui Jiang is an Assistant Professor in the Department of Biostatistics at University of Michigan. He received his Ph.D. in Computational and Mathematical Engineering from Stanford University in 2009. He received his B.S. and M.S. in Computer Science from Peking University. Before joining the University of Michigan in 2011, he was a postdoctoral scholar in the Department of Statistics and Genome Technology Center at Stanford University. He is interested in developing statistical and computational methods for the analysis of large-scale biological data generated using modern high-throughput technologies.

Contributor Information

Kefei Liu, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202..

Jieping Ye, Department of Computational Medicine and Bioin-formatics, University of Michigan, MI 48109..

Yang Yang, School of Computer Science and Engineering, Beihang University, Beijing 100191, China..

Li Shen, Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202..

Hui Jiang, Department of Biostatistics, University of Michigan, MI 48109..

REFERENCES

[1].Mortazavi A, Williams BA, McCue K, Schaeffer L, and Wold B, “Mapping and quantifying mammalian transcriptomes by RNA-Seq.” Nat Methods, vol. 5, no. 7, pp. 621–628, July 2008. [DOI] [PubMed] [Google Scholar]
[2].Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, and Pachter L, “Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation,” Nat Biotechnol, vol. 28, no. 5, pp. 511–515, May 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Jiang H and Wong WH, “Statistical inferences for isoform expression in RNA-Seq,” Bioinformatics, vol. 25, no. 8, pp. 1026–1032, April 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Salzman J, Jiang H, and Wong WH, “Statistical modeling of RNA-Seq data,” Statistical Science, vol. 26, no. 1, pp. 62–83, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[5].Wang Z, Gerstein M, and Snyder M, “RNA-Seq: a revolutionary tool for transcriptomics.” Nat Rev Genet, vol. 10, no. 1, pp. 57–63, January 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Dillies M-A, Rau A, Aubert J, Hennequet-Antier C, Jean mougin M, Servant N, Keime C, Marot G, Castel D, Estelle J,Guernec G, Jagla B, Jouneau L, Laloe D, Le Gall C, Schaeffer B,ffer L, Le Crom S, Guedj M, Jaffrezic F, and F. S. C., “A comprehensive evaluation of normalization methods for illumina high-throughput RNA sequencing data analysis,” Brief Bioinform, vol. 14, no. 6, pp. 671–683, November 2013. [DOI] [PubMed] [Google Scholar]
[7].Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, and Betel D, “Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data,” Genome Biology, vol. 14, no. 9, p. R95, September 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Robinson MD and Oshlack A, “A scaling normalization method for differential expression analysis of RNA-seq data,” Genome Biol, vol. 11, no. 3, p. R25, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[9].Bolstad BM, Irizarry RA, Astrand M, and Speed TP, “A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.” Bioinformatics, vol. 19, no. 2, pp. 185–193, January 2003. [DOI] [PubMed] [Google Scholar]
[10].Smyth GK, “Limma: linear models for microarray data,” in Bioinformatics and computational biology solutions using R and Bioconductor. Springer, 2005, pp. 397–420. [Google Scholar]
[11].Bullard JH, Purdom E, Hansen KD, and Dudoit S, “Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.” BMC Bioinformatics, vol. 11, p. 94, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Anders S and Huber W, “Differential expression analysis for sequence count data,” Genome Biol, vol. 11, no. 10, p. R106, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Li B, Ruotti V, Stewart RM, Thomson JA, and Dewey CN, “RNA-Seq gene expression estimation with read mapping uncertainty.” Bioinformatics, vol. 26, no. 4, pp. 493–500, February 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Oshlack A, Wakefield MJ et al. , “Transcript length bias in RNA seq data confounds systems biology,” Biol Direct, vol. 4, no. 1, p. 14, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Marioni JC, Mason CE, Mane SM, Stephens M, and Gilad Y, “RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays,” Genome research, vol. 18, no. 9, pp. 1509–1517, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Law CW, Chen Y, Shi W, and Smyth GK, “Voom: precision weights unlock linear model analysis tools for RNA-seq read counts,” Genome Biol, vol. 15, no. 2, p. R29, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, and Smyth GK, “limma powers differential expression analyses for RNA-sequencing and microarray studies,” Nucleic Acids Research, vol. 43, no. 7, p. e47, January 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[18].Love MI, Huber W, and Anders S, “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biology, vol. 15, no. 12, p. 550, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Jiang H and Salzman J, “A penalized likelihood approach for robust estimation of isoform expression,” Statistics and Its Interface, vol. 8, no. 4, pp. 437–445, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[20].Zhou X, Lindsay H, and Robinson MD, “Robustly detecting differential expression in RNA sequencing data using observation weights,” Nucleic acids research, vol. 42, no. 11, pp. e91–e91, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Boyd S, Parikh N, Chu E, Peleato B, and Eckstein J, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011. [Google Scholar]
[22].Jiang H and Zhan T, “Unit-free and robust detection of differential expression from RNA-Seq data,” Statistics in Biosciences, vol. 9, no. 1, pp. 178–199, 2017. [Google Scholar]
[23].Tibshirani R, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996. [Google Scholar]
[24].Ji H and Wong WH, “TileMap: create chromosomal map of tiling array hybridizations,” Bioinformatics, vol. 21, no. 18, pp. 3629–3636, 2005. [DOI] [PubMed] [Google Scholar]
[25].Ji H and Liu XS, “Analyzing ‘omics data using hierarchical models,” Nature biotechnology, vol. 28, no. 4, pp. 337–340, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].Smyth G, “Statistical applications in genetics and molecular biology,” Linear models and empirical Bayes methods for assessing differential expression in microarray experiments, 2004. [DOI] [PubMed] [Google Scholar]
[27].Friedman J, Hastie T, and Tibshirani R, “Regularization paths for generalized linear models via coordinate descent,” Journal of statistical software, vol. 33, no. 1, p. 1, 2010. [PMC free article] [PubMed] [Google Scholar]
[28].Yuan M and Lin Y, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006. [Google Scholar]
[29].Robinson MD, McCarthy DJ, and Smyth GK, “edgeR: a bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics, vol. 26, pp. 139–140, January 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Network CGAR, “The molecular taxonomy of primary prostate cancer,” Cell, vol. 163, pp. 1011–1025, November 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, and Liu J, “Mapsplice: accurate mapping of RNA-seq reads for splice junction discovery,” Nucleic acids research, vol. 38, p. e178, October 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Kim H, Lapointe J, Kaygusuz G, Ong DE, Li C, van de Rijn M,Brooks JD, and Pollack JR, “The retinoic acid synthesis gene ALDH1a2 is a candidate tumor suppressor in prostate cancer,” Cancer research, vol. 65, no. 18, pp. 8118–8124, 2005. [DOI] [PubMed] [Google Scholar]
[33].Busch C, Hanssen TA, Wagener C, and Öbrink B, “Down regulation of CEACAM1 in human prostate cancer: correlation with loss of cell polarity, increased proliferation rate, and gleason grade 3 to 4 transition,” Human pathology, vol. 33, no. 3, pp. 290–298, 2002. [DOI] [PubMed] [Google Scholar]
[34].Alinezhad S, Väänäanen R-M, Mattsson J, Li Y, Tallgrén T, Ochoa NT, Bjartell A, Åkerfelt M, Taimen P, Boström PJ¨ et al. , “Validation of novel biomarkers for prostate cancer progression by the combination of bioinformatics, clinical and functional studies,” PloS one, vol. 11, no. 5, p. e0155901, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental materials

NIHMS1043996-supplement-Supplemental_materials.pdf^{(109.7KB, pdf)}

[R1] [1].Mortazavi A, Williams BA, McCue K, Schaeffer L, and Wold B, “Mapping and quantifying mammalian transcriptomes by RNA-Seq.” Nat Methods, vol. 5, no. 7, pp. 621–628, July 2008. [DOI] [PubMed] [Google Scholar]

[R2] [2].Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, and Pachter L, “Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation,” Nat Biotechnol, vol. 28, no. 5, pp. 511–515, May 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] [3].Jiang H and Wong WH, “Statistical inferences for isoform expression in RNA-Seq,” Bioinformatics, vol. 25, no. 8, pp. 1026–1032, April 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] [4].Salzman J, Jiang H, and Wong WH, “Statistical modeling of RNA-Seq data,” Statistical Science, vol. 26, no. 1, pp. 62–83, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] [5].Wang Z, Gerstein M, and Snyder M, “RNA-Seq: a revolutionary tool for transcriptomics.” Nat Rev Genet, vol. 10, no. 1, pp. 57–63, January 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Dillies M-A, Rau A, Aubert J, Hennequet-Antier C, Jean mougin M, Servant N, Keime C, Marot G, Castel D, Estelle J,Guernec G, Jagla B, Jouneau L, Laloe D, Le Gall C, Schaeffer B,ffer L, Le Crom S, Guedj M, Jaffrezic F, and F. S. C., “A comprehensive evaluation of normalization methods for illumina high-throughput RNA sequencing data analysis,” Brief Bioinform, vol. 14, no. 6, pp. 671–683, November 2013. [DOI] [PubMed] [Google Scholar]

[R7] [7].Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, and Betel D, “Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data,” Genome Biology, vol. 14, no. 9, p. R95, September 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Robinson MD and Oshlack A, “A scaling normalization method for differential expression analysis of RNA-seq data,” Genome Biol, vol. 11, no. 3, p. R25, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] [9].Bolstad BM, Irizarry RA, Astrand M, and Speed TP, “A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.” Bioinformatics, vol. 19, no. 2, pp. 185–193, January 2003. [DOI] [PubMed] [Google Scholar]

[R10] [10].Smyth GK, “Limma: linear models for microarray data,” in Bioinformatics and computational biology solutions using R and Bioconductor. Springer, 2005, pp. 397–420. [Google Scholar]

[R11] [11].Bullard JH, Purdom E, Hansen KD, and Dudoit S, “Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.” BMC Bioinformatics, vol. 11, p. 94, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] [12].Anders S and Huber W, “Differential expression analysis for sequence count data,” Genome Biol, vol. 11, no. 10, p. R106, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] [13].Li B, Ruotti V, Stewart RM, Thomson JA, and Dewey CN, “RNA-Seq gene expression estimation with read mapping uncertainty.” Bioinformatics, vol. 26, no. 4, pp. 493–500, February 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Oshlack A, Wakefield MJ et al. , “Transcript length bias in RNA seq data confounds systems biology,” Biol Direct, vol. 4, no. 1, p. 14, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Marioni JC, Mason CE, Mane SM, Stephens M, and Gilad Y, “RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays,” Genome research, vol. 18, no. 9, pp. 1509–1517, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] [16].Law CW, Chen Y, Shi W, and Smyth GK, “Voom: precision weights unlock linear model analysis tools for RNA-seq read counts,” Genome Biol, vol. 15, no. 2, p. R29, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, and Smyth GK, “limma powers differential expression analyses for RNA-sequencing and microarray studies,” Nucleic Acids Research, vol. 43, no. 7, p. e47, January 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] [18].Love MI, Huber W, and Anders S, “Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2,” Genome Biology, vol. 15, no. 12, p. 550, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] [19].Jiang H and Salzman J, “A penalized likelihood approach for robust estimation of isoform expression,” Statistics and Its Interface, vol. 8, no. 4, pp. 437–445, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] [20].Zhou X, Lindsay H, and Robinson MD, “Robustly detecting differential expression in RNA sequencing data using observation weights,” Nucleic acids research, vol. 42, no. 11, pp. e91–e91, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Boyd S, Parikh N, Chu E, Peleato B, and Eckstein J, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011. [Google Scholar]

[R22] [22].Jiang H and Zhan T, “Unit-free and robust detection of differential expression from RNA-Seq data,” Statistics in Biosciences, vol. 9, no. 1, pp. 178–199, 2017. [Google Scholar]

[R23] [23].Tibshirani R, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996. [Google Scholar]

[R24] [24].Ji H and Wong WH, “TileMap: create chromosomal map of tiling array hybridizations,” Bioinformatics, vol. 21, no. 18, pp. 3629–3636, 2005. [DOI] [PubMed] [Google Scholar]

[R25] [25].Ji H and Liu XS, “Analyzing ‘omics data using hierarchical models,” Nature biotechnology, vol. 28, no. 4, pp. 337–340, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].Smyth G, “Statistical applications in genetics and molecular biology,” Linear models and empirical Bayes methods for assessing differential expression in microarray experiments, 2004. [DOI] [PubMed] [Google Scholar]

[R27] [27].Friedman J, Hastie T, and Tibshirani R, “Regularization paths for generalized linear models via coordinate descent,” Journal of statistical software, vol. 33, no. 1, p. 1, 2010. [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Yuan M and Lin Y, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 1, pp. 49–67, 2006. [Google Scholar]

[R29] [29].Robinson MD, McCarthy DJ, and Smyth GK, “edgeR: a bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics, vol. 26, pp. 139–140, January 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Network CGAR, “The molecular taxonomy of primary prostate cancer,” Cell, vol. 163, pp. 1011–1025, November 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] [31].Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, and Liu J, “Mapsplice: accurate mapping of RNA-seq reads for splice junction discovery,” Nucleic acids research, vol. 38, p. e178, October 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Kim H, Lapointe J, Kaygusuz G, Ong DE, Li C, van de Rijn M,Brooks JD, and Pollack JR, “The retinoic acid synthesis gene ALDH1a2 is a candidate tumor suppressor in prostate cancer,” Cancer research, vol. 65, no. 18, pp. 8118–8124, 2005. [DOI] [PubMed] [Google Scholar]

[R33] [33].Busch C, Hanssen TA, Wagener C, and Öbrink B, “Down regulation of CEACAM1 in human prostate cancer: correlation with loss of cell polarity, increased proliferation rate, and gleason grade 3 to 4 transition,” Human pathology, vol. 33, no. 3, pp. 290–298, 2002. [DOI] [PubMed] [Google Scholar]

[R34] [34].Alinezhad S, Väänäanen R-M, Mattsson J, Li Y, Tallgrén T, Ochoa NT, Bjartell A, Åkerfelt M, Taimen P, Boström PJ¨ et al. , “Validation of novel biomarkers for prostate cancer progression by the combination of bioinformatics, clinical and functional studies,” PloS one, vol. 11, no. 5, p. e0155901, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq data

Kefei Liu

Jieping Ye

Yang Yang

Li Shen

Hui Jiang

Roles

Abstract

1. INTRODUCTION

2. DATA MODEL AND PROBLEM FORMULATION

2.1. Data model

2.2. Penalized likelihood

3. ALGORITHM DEVELOPMENT

3.1. Formulation of (7) as Penalized Simple Linear Regression Model

Assumption 3.1.

3.2. Model Fitting by ADMM

3.3. Estimation of Tuning Parameter α

3.4. Maximum likelihood estimation of {σi2}i=1m

4. EXTENSION TO MULTIPLE LINEAR REGRESSION MODEL AND ALGORITHM DEVELOPMENT

4.1. Regression with type I penalty: Model fitting by ADMM

4.2. Regression with type II penalty: Model fitting by ADMM

4.3. Estimation of Tuning Parameter α

4.3.1. Type I Penalty

4.3.2. Type II Penalty

4.4. Maximum likelihood estimation of {σi2}i=1m

5. EXPERIMENTS

5.1. Simulations on Synthetic Data

Table 1:

Figure 1:

Table 2:

Table 3:

5.2. An application to a real RNA-Seq dataset

Figure 2:

6. DISCUSSION

Supplementary Material

ACKNOWLEDGMENTS

Biographies

Contributor Information

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.4. Maximum likelihood estimation of ${σ_{i}^{2}}_{i = 1}^{m}$

4.4. Maximum likelihood estimation of ${σ_{i}^{2}}_{i = 1}^{m}$