Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2025 Jul 15;44(15-17):e70163. doi: 10.1002/sim.70163

Transfer Learning for Error‐Contaminated Poisson Regression Models

Jou‐Chin Wu 1, Li‐Pang Chen 1,
PMCID: PMC12333914  PMID: 40662525

ABSTRACT

Poisson regression model has been a popular approach to characterize the count response and the covariates. With the rapid development of data collections, the additional source information can be easily recorded. To efficiently use the source data to improve the estimation under the original data, the transfer learning method is considered a strategy. However, challenging issues from the given datasets include measurement error and high‐dimensionality in variables, which are not well explored in the context of transfer learning. In this paper, we propose a novel strategy to handle error‐prone count responses and estimate the parameters in measurement error models by using the source data, and then employ the transfer learning method to derive the corrected estimator. Moreover, to improve the prediction and avoid the model uncertainty, we further establish the model averaging strategy. Simulation and breast cancer data studies verify the satisfactory performance of the proposed method and the validity of handling measurement error.

Keywords: error‐prone count variables, model averaging, prediction, variable selection

1. Introduction

In biomedical studies, one of the research interests is to use gene expressions or mRNA variables to characterize some specific measures for a specific cancer. The motivating example is the breast cancer data collected by Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). This dataset contains 12,132 mRNA variables and nine genes recorded by copy number alterations (CNA), which reflect the level of risk caused by genes for breast cancer. By treating the number of nonzero CNA among nine genes to characterize the risk of breast cancer as the count response, the research goal is to build up the relationship between the count response and mRNA by the Poisson regression model. Moreover, identifying informative mRNA variables for the count response may improve the accuracy of the prediction.

In addition to the dataset collected by METABRIC, which is referred to as the target data that we are primarily interested in, with the rapid development of data collection, researchers can easily collect datasets containing the same mRNA and CNA for various resources. We call additional datasets the source data; more details can be found in Section 7. Intuitively, one may wish to incorporate the source data with the target data to build up the Poisson regression, however, the resulting estimator for the Poisson regression may be opposite to what we expect. This is because some of the source datasets have different structures from the target data. Hence, to effectively adopt source data that are informative to the target data and estimate the informative parameters for the Poisson regression model, the transfer learning method is a useful approach in recent years, whose idea is to identify informative source data and train them with the target data to build up the regression model. In the literature, several transfer learning methods have been proposed for high‐dimensional data. For example, Bastani [1] proposed a two‐step transfer learning algorithm for high‐dimensional generalized linear models (GLM) with single‐source data. Chen et al. [2] proposed the data‐enriched model for linear and logistic regression with a single source of data incorporated. In the presence of multiple source data, Li et al. [3] proposed Trans‐Lasso, which is a data‐driven procedure for high‐dimensional linear regression. Tian and Feng [4] applied transfer learning to handle high‐dimensional GLM.

While the transfer learning methods have been widely discussed, the developments are based on an implicit and unreliable assumption that variables are precisely measured. In contrast, measurement error arises ubiquitously and inevitably. For example, the CNA values in the motivating example are possibly subject to measurement error due to experimental noise (e.g., Zhang and Zhang [5]), systematic biases (e.g., Chiang et al. [6]), or sample heterogeneity (e.g., Grasso et al. [7]). Using error‐prone variables in the Poisson regression model may incur tremendous bias for the estimator, as explored in some existing literature (e.g., Carroll et al. [8]). Hence, measurement error correction is crucial to derive the reliable estimator for the Poisson regression model. A large body of correction methods has also been developed in the past literature. For example, in the presence of measurement error in covariates, Kukush et al. [9] considered normally distributed error in covariates under the Poisson regression. Sørensen et al. [10] focused on the challenge of selecting relevant covariates that are subject to measurement error in high‐dimensional data. Yang et al. [11] proposed the quasi‐likelihood estimation method to deal with classical and/or Berkson measurement errors in covariates. Jiang et al. [12] considered the Poisson regression models with noisy high‐dimensional covariates, and proposed a penalized target function to correct estimation bias and examine the hypothesis test. Chen [13] and Chen and Qiu [14] developed the boosting algorithms to deal with measurement error and variable selection. In contrast, when the count responses are subject to measurement error, Zhang and Yi [15] introduced the Bayesian method to address measurement error and estimate the parameters for zero‐inflated Poisson regression models. However, existing methods were developed under one target data, but they may not be valid when multiple source data are available. Moreover, transfer learning methods for measurement error have not yet been explored.

In the presence of high‐dimensional data, such as mRNA in the motivating example, the best combination of variables can be identified, and one may adopt the selected variables to do further analysis, such as model fitting or prediction. However, the main concern of this approach is the model uncertainty since the true model is unknown, which may make the prediction imprecise. To tackle this issue, the model averaging method is a widely used strategy, whose key idea is to integrate candidate models by linear combinations with suitable weights. In the past literature, model averaging procedures have been widely discussed when variables are either precisely measured or subject to measurement error. To name a few, Zou et al. [16] proposed a model averaging method with optimal weight in Poisson regressions. Chen and Yi [17] proposed a method to deal with truncated or censored data by using the focus information criterion (FIC) with data contaminated by error. Zhang et al. [18] proposed a data‐driven optimal weight in linear measurement error models. However, existing model averaging methods are restricted to a given dataset, yet the model averaging for the transfer learning procedure has not been fully explored.

Motivated by the challenges in the breast cancer data, to derive a reliable estimator and precise prediction with the source data and measurement error correction taken into account, we propose the transfer learning method for error‐in‐response Poisson regression. Specifically, we first characterize the error‐prone count response and propose the insertion method to derive the modified log‐likelihood function. Next, we consider an iteratively two‐stage transfer learning algorithm with informative source data selection, and perform variable selection and estimation. Moreover, we further propose a valid method to estimate the unknown parameters in the noise terms of the measurement error model by the informative source data and extend the model averaging method to the transfer learning method. Through the numerical studies, the proposed method has satisfactory performance with precise estimator and prediction.

The remainder is organized as follows: In Section 2, we introduce the data structure and Poisson regression models. In addition, measurement error models for counted responses are introduced. In Section 3, we develop the corrected estimating function based on the Poisson regression model, and then employ the transfer learning method to do variable selection and estimation. In Section 4, we develop valid estimation methods to derive unknown parameters in measurement error models. In Section 5, we introduce the model averaging algorithm to improve prediction and model uncertainty. In Section 6, we conduct simulation studies to assess the performance of the proposed method. Moreover, we apply the proposed method to analyze the breast cancer data in Section 7. A general discussion is presented in Section 8.

2. Notation and Models

2.1. Data Structure and Regression Models

Let X denote a p‐dimensional random vector and let Y be a random variable with count value of our primary interest. To characterize Y and X, we follow the framework of GLM and link Y and X through a Poisson regression

P(Y|X)=exp{exp(Xβ)}{exp(Xβ)}YY! (1)

for all Y=0,1,2,, where β is a p‐dimensional vector of parameters of our interest that is possibly sparse in the sense that most components are zero and only a few are nonzero, reflecting that rare components in X are informative to Y.

To estimate β, we consider the dataset with sample size n0, which is denoted as 𝒟0{Xi(0),Yi(0)}:i=1,,n0 with (Xi(0),Yi(0)) being independent and identically distributed as (X,Y). In particular, we call 𝒟0 the target data. Let 𝕐(0)=(Y1(0),Y2(0),,Yn0(0)) denote the column vector of the target response Yi(0) with size n0, and let 𝕏(0)=(X1(0),X2(0),,Xn0(0)). In the framework of GLM, we usually estimate β by optimizing the following log‐likelihood function under 𝒟0 and (1):

(β;𝒟0)i=1n0Yi(0)Xi(0)βexp(Xi(0)β), (2)

where reflects the ignorance of log(Yi(0)!) because of independence of β. To the end, the first argument in the likelihood function (·;·) indicates the parameter of the function, the second argument represents the dataset for the construction of the likelihood function. On the other hand, during the data collection or the sampling scheme, we sometimes record the auxiliary information that may have the same variables as those in the target data 𝒟0. We call such auxiliary information source data. Specifically, suppose that there are K source data, and let 𝒟k={Xi(k),Yi(k)}:i=1,,nk denote the kth source data with size nk for k=1,,K. Let 𝕐(k)=(Y1(k),Y2(k),,Ynk(k)) denote the column vector and let 𝕏(k)=(X1(k),X2(k),,Xnk(k)) for k=1,,K.

While the existence of the source data 𝒟k provides us with more information to improve the estimator of β, a crucial issue is that some of the source data are not identical to the target data. Specifically, note that β is the parameter of the Poisson regression for the target data and let w(k) denote the parameter that characterizes 𝕐(k) and 𝕏(k) as the Poisson regression model (1) for the kth source data 𝒟k for k=1,,K. Moreover, define δ(k)βω(k) as the kth contrast between β and ω(k). Following the discussion in Li et al. [3], small values of δ(k) in the L0‐ or L1‐norm, say δ(k)q for q{0,1}, indicate the similarity of the target data and the kth source data. We call the collection of those source data that have smaller values of δ(k)q and are “similar” to the target data the informative source data associated with the target data, and we wish to integrate the informative source data with the target data to improve the estimation of the parameter β, where the improvement of the estimators will be justified by the biases in the L2‐norm and some prediction criteria in simulation studies. Finally, the detection of informative source data is deferred to Section 3.2.2.

2.2. Measurement Error Model

In real data applications, the responses Yi(0) and Yi(k) for k=1,,K in target/source data are possibly subject to measurement error, and thus, the responses Yi(k) are often not observed. In contrast, the surrogate responses, denoted as Yi(k), are observed.

To characterize two count‐valued random variables Yi(k) and Yi(k), we follow the similar idea in Zhang and Yi [15] to consider the following measurement error model

Yi(k)=Yi(k)+aZi(k)bWi(k)fork=0,1,,K,i=1,,nk, (3)

where a,b{0,1} are the indicators reflecting the relationship between Yi(k) and Yi(k), Zi(k) and Wi(k) are two different random variables that are independent of each other. To ensure that Yi(k) has positive integer values, Zi(k) and Wi(k) are required to follow distributions with counted outputs, such as Poisson distributions or binomial distributions. When (a,b)=(1,0), then (3) results in an increasing counted value, such that the surrogate response is greater than or equal to the unobserved one. It implies that Zi(k) is independent of Yi(k) and is reasonable to assume that E(Zi(k))=λ>0. One of the possible specifications is the Poisson distribution Zi(k)Pois(λ), keeping in mind that our development may not restrict itself to the Poisson distribution. On the other hand, when (a,b)=(0,1), (3) leads to a decreased count from the unobserved response Yi(k). To ensure that Yi(k) is non‐negative, we require that the maximum value of Wi(k) is not greater than Yi(k). Therefore, the binomial distribution Wi(k) Binomial(Yi(k),π) with E(Wi(k))=πYi(k) and π(0,1) being a probability of success is one of the recommended distributions. Finally, (a,b)=(1,1) incorporates both “add‐in” and “leave‐out” errors, ensuring the mixture of addition and subtraction of the values induced by measurement error.

We comment on two indicators a and b. In applications, the choice of a and b can be suitably specified by the background information or researchers' experience. For example, if one believes that the observed responses are always greater (or smaller) than the unobserved ones, then a=1 and b=0 (or a=0 and b=1) can be implemented. Otherwise, we can consider the general situation a=1 and b=1, which will be demonstrated for the real data analysis in Section 7. In the following development, we discuss these three cases separately.

The other notable situation in the measurement error model (3) is that we implicitly assume that Zi(k) shares the homogeneous expectation λ and Wi(k) has the common component π in the expectation among the target (k=0) and source (k=1,,K) data. Based on this setting, we can integrate the target and source data to estimate two common parameters λ and π; detailed discussion is deferred to Section 4. In addition, common parameters λ and π enable us to detect informative source data, where details are provided in Section 3.2.2. In applications, a general situation is that two parameters λ and π are dependent on the target and source data, that is, Zi(k) Pois(λ(k)) and Wi(k) Binomial(Yi(k),π(k)) for k=0,1,,K with λ(k)λ(k) and π(k)π(k) for any kk. However, under this general setting, these parameters cannot be estimated by integrating the target and source data. One may require auxiliary information, such as external or internal validation data, in the target and source data.

Now and hereafter, we denote 𝒟k as the observed target/source data, which are defined as 𝒟k with Yi(k) replaced by Yi(k) for k=0,1,,K.

3. Methodology

3.1. Correction of Measurement Error

In the presence of measurement error, naively using the error‐prone variables may lead to biased estimators and incorrect regression models in (2). Therefore, it is crucial to address measurement error.

Specifically, by the M‐estimation theory, it is known that E(β;𝒟0)β=0p gives the solution β0 that is the true value of the parameters, where 0p is a p‐dimensional zero vector. However, this equality does no longer hold if 𝕐(0) is replaced by 𝕐(0)(Y1(0),,Yn0(0)). Consequently, it motivates us to derive a new log‐likelihood function (β;𝒟0) based on 𝕐(0), such that

E(β;𝒟0)|𝕐(0),𝕏(0)=(β;𝒟0) (4)

holds. Taking the derivative of both sides in (4) with respect to β gives

E(β;𝒟0)β|𝕐(0),𝕏(0)=(β;𝒟0)β,

provided that the expectation and the differentiation are interchangeable. It suggests that β0 satisfies E(β0;𝒟0)β=E(β0;𝒟0)β=0p, or equivalently,

β0argmaxβE(β;𝒟0)=argmaxβE(β;𝒟0). (5)

Then the solution to (β;𝒟0)β=0p gives the maximum likelihood estimator (MLE) of β, denoted as β˜. By the theory of maximum likelihood estimation, β˜ is the consistent estimator of β0, suggesting that (β;𝒟0) is the “corrected” log‐likelihood function and provides the corrected estimator. This approach follows the idea of the insertion method in measurement error analysis (e.g., Chen and Yi, [17]; Nakamura [19]; Carroll et al. [8], Chapter 7) that aims to find the “corrected” estimating function, such that its conditional expectation can recover to the function under the unobserved random variable. Unlike existing methods that focus on the correction of error‐prone covariates, the current development is the correction of error‐prone count response.

We start our derivation by (β;𝒟0). Taking the conditional expectation gives that

E(β;𝒟0)|𝕐(0),𝕏(0)i=1n0E(Yi(0)|𝕐(0),𝕏(0))Xi(0)βexp(Xi(0)β) (6)

where “” reflects the ignorance of log(Yi(0)!) because of independence of β. We observe that (6) contains the conditional expectation E(Yi(0)|𝕐(0),𝕏(0)), which should be derived by (3). Suppose that E(Yi(0)|𝕐(0),𝕏(0)) can be derived as

E(Yi(0)|𝕐(0),𝕏(0))=C1Yi(0)+C2 (7)

for some constants C1>0 and C2 that are related to λ and/or π since they are produced by the conditional expectation under the measurement error model (3). Then combining (6) and (7) gives that

E1C1(β;𝒟0)i=1n0C2Xi(0)βi=1n0(C11)exp(Xi(0)β)𝕐(0),𝕏(0)=(β;𝒟0),

showing that

(β;𝒟0)1C1(β;𝒟0)i=1n0C2Xi(0)βi=1n0(C11)exp(Xi(0)β) (8)

It suggests that (8) is the corrected likelihood function and satisfies (4) and (5). We also note that the values C1 and C2 are implicitly dependent on the choices of a and b, which are discussed later. Moreover, regardless of the values a and b, the constant C1 is always nonzero, so that the corrected likelihood function (8) is well defined, and the conditional expectation of the error‐prone response can be recovered to the unobserved one as shown in (7), which is a similar phenomenon in the classical measurement error model (e.g., Carroll et al. [8]).

To the end, we separately examine three different cases (a,b)=(0,1),(1,0) and (1,1), derive E(Yi(0)|𝕐(0),𝕏(0)) and the function accordingly.

Case 1: leave‐out error only (a=0 and b=1)

In this case, the surrogate response Yi(0) in (3) is characterized as

Yi(0)=Yi(0)Wi(0).

The conditional expectation of the observed response Yi(0) given the unobserved response and the covariates is given by

E(Yi(0)|𝕐(0),𝕏(0))=E(Yi(0)|𝕐(0),𝕏(0))E(Wi(0)|𝕐(0),𝕏(0))=Yi(0)Yi(0)π=(1π)Yi(0),

where π represents the probability defined in Section 2.2. By (7), we have that C1=1π and C2=0. It suggests that the corrected log‐likelihood function (8) is given by

(β;𝒟0)=11π(β;𝒟0)+i=1n0πexp(Xi(0)β). (9)

Case 2: add‐in error only (a=1 and b=0)

For this case, the model (3) reduces to

Yi(0)=Yi(0)+Zi(0).

Then the conditional expectation of Yi(0) given the unobserved response and the covariates is given by

E(Yi(0)|𝕐(0),𝕏(0))=E(Yi(0)|𝕐(0),𝕏(0))+E(Zi(0)|𝕐(0),𝕏(0))=Yi(0)+λ,

yielding that C1=1 and C2=λ in (7) with λ representing the parameter in Section 2.2. Therefore, the corrected log‐likelihood function in Case 2 is

(β;𝒟0)=(β;𝒟0)i=1n0λXi(0)β. (10)

Case 3: both add‐in and leave‐out errors (a=1 and b=1)

Based on (3), the conditional expectation of Yi(0) is

E(Yi(0)|𝕐(0),𝕏(0))=E(Yi(0)|𝕐(0),𝕏(0))+E(Zi(0)|𝕐(0),𝕏(0))E(Wi(0)|𝕐(0),𝕏(0))=(1π)Yi(0)+λ, (11)

where C1=1π and C2=λ due to (7). It yields that (8) is expressed as

(β;𝒟0)=11π(β;𝒟0)i=1n0λXi(0)β+i=1n0πexp(Xi(0)β). (12)

In the following development, we temporarily assume that the parameters π and λ are known and use (8) to derive the estimator of β.

3.2. Transfer Learning With Measurement Error in Responses

3.2.1. Estimation via Transfer Learning

To estimate β by (8) with source data 𝒟k incorporated, we adopt the transfer learning method. We first demonstrate the transfer learning procedure by using all source data. Suppose that variables in 𝒟k share the same structure (1) for all k=1,,K. We first aggregate the target data and all source data, denoted as k=0K𝒟k. These data are utilized to obtain preliminary overall estimator and perform variable selection by the following optimization

w^=argminw1k=0Knkk=0K1C1{(w;𝒟k)i=1nkC2Xi(k)wi=1nk(C11)exp(Xi(k)w)}+ϕww1, (13)

where w is the vector of parameters that links 𝕐(k)(Y1(k),Y2(k),,Ynk(k)) and 𝕏(k) in the source data 𝒟k for all k, ϕw is the tuning parameter, and C1 and C2 are defined in (7) reflecting Cases 1–3.

With w^ obtained by (13), let δβw^ denote the contrast between ω^ and β. Following the framework of transfer learning (e.g., Li et al. [3]; Tian and Feng [4]), the next step is to construct the corrected likelihood function for the target data with β replaced by w^+δ. In this way, it is sufficient to estimate δ, and we then use the resulting estimator of δ to derive the debiased estimator of β. Specifically, we solve the following penalized likelihood function under the target data 𝒟0:

δ^=argminδ1n0C1(w^+δ);𝒟0i=1n0C2Xi(0)(w^+δ)i=1n0(C11)expXi(0)(w^+δ)+ϕδw^+δ1, (14)

where ϕδ is the tuning parameter. Consequently, the estimator of β is given by

β^=w^+δ^. (15)

3.2.2. Detection of Source Data

As noted in Section 2.1, there exist K different source data and not all of them have the similar structure to the target data. We need to select source data that are informative to the target data and use them to improve the estimator of β based on the estimation procedure in Section 3.2.1.

Specifically, we first randomly divide the target data 𝒟0 into R subsamples with approximated equal sizes, and we denote 𝒟0[r] as the rth subsample for r=1,,R. For r=1,,R, let 𝒟0[r]jr𝒟0[j] denote the subdata of 𝒟0 with the rth subsample 𝒟0[r] removed for r=1,,R. Given the subdata 𝒟0[r], we solve the following optimization for the parameter β:

β^(0)[r]=argminβ1n(𝒟0[r])i𝒟0[r]YiXiβ+exp(Xiβ)+ϕ(0)[r]β1,

where n(𝒜) denotes the number of elements in the set 𝒜 and ϕ(0)[r] is a tuning parameter. In addition, for k=1,,K, we use the union of the target subdata and the kth source data, denoted as 𝒟k[r]=𝒟k𝒟0[r] to run two‐stage transfer learning in Section 3.2.1 and obtain the estimator β^(k)[r].

Next, we apply the subdata 𝒟0[r] to define the following validated likelihood function

L^0[r](β)=1n(𝒟0[r])i𝒟0[r]Yi(0)[r]Xi(0)[r]β+exp(Xi(0)[r]β))+log(Yi(0)[r]!). (16)

Based on estimators β^(0)[r] and β^(k)[r] for r=1,,R and k=1,,K, we then define L^0(0)1Rr=1RL^0[r](β^(0)[r]) and L^0(k)1Rr=1RL^0[r](β^(k)[r]) for k=1,,K. Finally, the set containing informative source data is given by

𝒜I=k0:|L^0(k)L^0(0)|C0σL2, (17)

where C0 is a pre‐specified positive constant and σL21R1r=1RL^0[r](β^(0)[r])L^0(0)2, which follows a similar formulation in Tian and Feng [4].

We note that L^0(k) and L^0(0) are derived by (2) with error‐prone responses Yi(k) instead of using the corrected likelihood function (9), (10), or (12), which is due to that the corrected function involves parameters λ and π that are usually unknown. In addition, the goal here is to determine the informative source data instead of estimating the parameter β. To see why the approach works for source data detection, we observe from (11) that

E(Yi(0)|𝕏(0))=(1π)E(Yi(0)|𝕏(0))+λ.

It implies that the expectation of L^0[r](β) in (16) is given by

EL^0[r](β)=EEL^0[r](β)|𝕏(0)=EE1n(𝒟0[r])i𝒟0[r]Yi(0)[r]Xi(0)[r]β+exp(Xi(0)[r]β)𝕏(0)=(1π)EL0[r](β)λE(X(0)[r]β)+πE1n(𝒟0[r])i𝒟0[r]exp(Xi(0)[r]β)+E1n(𝒟0[r])i𝒟0[r]log(Yi(0)[r]!)log(Yi(0)[r]!), (18)

where L0[r](β) is defined as (16) with 𝒟0 and Yi(0) replaced by 𝒟0 and Yi(0). (18) reflects that (16) can be expressed by L0[r](β) and additional terms in the population perspective. Under the empirical estimate of (18) with β replaced by β^(k)[r] and β^(0)[r], L^0(k)L^0(0) can be expressed as

L^0(k)L^0(0)=(1π)L0(k)L0(0)λ1Rr=1RX(0)[r]β^(k)[r]β^(0)[r]+π1Rr=1R1n(𝒟0[r])i𝒟0[r]exp(Xi(0)[r]β^(k)[r])exp(Xi(0)[r]β^(0)[r])(1π)L0(k)L0(0)+B,

where B represents the values in the second and third term. By the triangle inequality, we have that

L0(k)L0(0)11πL^0(k)L^0(0)+11π|B|. (19)

Provided the upper bound C0σL2 in (17), (19) suggests that L0(k)L0(0)C0σL2 with C0=C01π+|B|(1π)σL2, which is similar to the selection criterion of the source data in Tian and Feng [4]. It implies that the closeness between observed target and source data 𝒟0 and 𝒟k indicates the closeness of unobserved ones 𝒟0 and 𝒟k. The other notable remark is that the implementation is based on the observed data without the correction step in Section 3.1, which is due to the fact that the parameters λ and π are unknown and the purpose here is to detect informative source data rather than to estimate the parameter β.

Based on the set (17), we aggregate the informative source datasets with the target dataset, denoted as k{0}𝒜I𝒟k, and we estimate w by (13) under the target and informative source data, that is,

w^𝒜I=argminw1n𝒜I+n0k{0}𝒜I1C1{(w;𝒟k)i=1nkC2Xi(k)wi=1nk(C11)exp(Xi(k)w)}+ϕw2w1, (20)

where ϕw2 is the tuning parameter and n𝒜I represents the total sample size in k𝒜I𝒟k.

Similarly, we estimate δ by (14) with w^ replaced by w^𝒜I, that is,

δ^𝒜I=argminδ1n0C1(w^𝒜I+δ);𝒟0i=1n0C2Xi(0)(w^𝒜I+δ)i=1n0(C11)expXi(0)(w^𝒜I+δ)+ϕδ2w^𝒜I+δ1, (21)

where ϕδ2 is the tuning parameter. Consequently, the resulting estimator of β is given by ω^𝒜I+δ^𝒜I.

3.3. Computational Implementation via Local Quadratic Approximation

Noting that the derivations of β^ require two penalized estimating functions (20) and (21), we discuss the computational implementation in this section. Unlike other approaches in the existing literature, such as the coordinate descent method (Friedman et al. [20]), or the proximal gradient descent method (Klosa et al. [21]), we primarily employ an iterative method using the Newton‐Raphson algorithm in conjunction with a local quadratic approximation (e.g., Lee et al. [22]) because it helps simplify and accelerate solving the problem via the finite iterations in the recursive formula.

Let (β;𝒟0)β(β;𝒟0) and (β;𝒟0)2ββ(β;𝒟0). Here we assume that the matrix (β;𝒟0) is positive definite so that its inverse exists. By the Karush‐Kuhn‐Tucker (KKT) conditions and the Newton–Raphson algorithm, we obtain the following recursive form to solve (20):

w(t+1)=w(t)w(t);𝒟0k𝒜𝒟k1·w(t);𝒟0k𝒜𝒟kϕw2·sign(w(t)) (22)

for t=0,1,2,, where w(t) is the value evaluated at the tth iteration, sign(w(t)) is a vector with component sign(wi(t)) that is in {1,1} if wi(t)0 or is 0 if wi(t)=0, wi(t) is the ith element of w(t), and (w;𝒟0{k𝒜𝒟k}) represents the objective function in (20). To implement (22) and obtain the estimator of w with the optimal tuning parameter ϕw2, we employ the R‐fold cross‐validation. Specifically, we randomly split the data in 𝒟0𝒟k for all k𝒜I into R subdata with approximately equal size. For r=1,,R, let the rth subdata be the rth validation data 𝒱r and let the merge of the remaining subdata denote the rth training data 𝒯r. We first run (22) with the initial value given by w(0)=0p and a given ϕw2 under 𝒯r to obtain the estimator w^r(ϕw2). After that, we compute the predicted error under 𝒱r, which is denoted as PEr. Finally, repeating these steps r times gives the cross‐validation value that is a function of ϕw2:

CV(ϕw2)1Rr=1RPEr.

Then the optimal tuning parameter is given by ϕ^w2argminϕw2CV(ϕw2), and thus, the resulting estimator of w is given by w^𝒜Iw^𝒜I(ϕ^w2). By a similar derivation to (22), we use the following recursive formula to solve (21):

δ(t+1)=δ(t)(w^𝒜I+δ(t));𝒟01·(w^𝒜I+δ(t));𝒟0ϕδ2·sign(w^𝒜I+δ(t)) (23)

for t=0,1,2,, where δ(t) is the value evaluated at the tth iteration. One can employ the same derivations of (22) to derive the estimator of δ with the optimal ϕδ2 under (23).

4. Estimation With Unknown Parameters in Measurement Error Models

In Section 3, we derived the corrected likelihood function and perform the transfer learning algorithm by pretending that parameters λ and π of the noise terms Zi(k) and Wi(k) in (3) are known. However, an implicit assumption of knowing parameters λ and π from Zi(k) and Wi(k) in (3) is unrealistic in applications. Hence, we relax this implicit assumption and estimate λ and π by integrating informative source data in this section. To the end, we separately examine three cases in Section 2.2.

First, for Case 1, the conditional expectation of (3) with a=0 and b=1 is expressed as

E(Yi(k)|𝕏(k))=E(Yi(k)|𝕏(k))E(Wi(k)|𝕏(k))=E(Yi(k)|𝕏(k))EE(Wi(k)|Yi(k),𝕏(k))|𝕏(k)=E(Yi(k)|𝕏(k))E(πYi(k)|𝕏(k))=(1π)E(Yi(k)|𝕏(k))=(1π)exp(Xi(k)w) (24)

for all k{0}𝒜I, where the last step is due to the mean function based on (1). Taking the expectation on (24) and the summation with respect to k{0}𝒜I give that

π=1k{0}𝒜IE(Yi(k))k{0}𝒜IEexp(Xi(k)w). (25)

We further estimate E(Yi(k)) and E{exp(Xi(k)w)} by empirical estimates Y(k)=1nki=1nkYi(k) and 1nki=1nkexp(Xi(k)w), respectively, then (25) is given by

π=1k{0}𝒜IY(k)k{0}𝒜I1nki=1nkexp(Xi(k)w). (26)

We now run the iterations based on (26) and (22). First, the initial value w(0)=0p is given. At the tth iteration, we compute

π(t+1)=1k{0}𝒜IY(k)k{0}𝒜I1nki=1nkexpXi(k)w(t) (27)

for t=1,, and then plug (27) into (22) to obtain w(t+1). Let w^ denote the convergence value in (22) and let π^ denote the resulting convergence value in (27).

Under Case 2, the conditional expectation of (3) with a=1 and b=0 is given by

E(Yi(k)|𝕏(k))=E(Yi(k)|𝕏(k))+E(Zi(k)|𝕏(k))=exp(Xi(k)w)+λ (28)

for k{0}𝒜I. Taking the expectation on (28) gives that

λ=1n({0}𝒜I)k{0}𝒜IE(Yi(k))Eexp(Xi(k)w).

Following the discussion in (27), we obtain the following recursive form to estimate λ:

λ(t+1)=1n({0}𝒜I)k{0}𝒜IY(k)1nki=1nkexpXi(k)w(t) (29)

for t=1,. With the initial value given by w(0)=0p, repeating (29) and (22) iteratively gives the convergence values w^ and λ^, respectively.

Finally, under Case 3, the conditional expectation of (3) with a=1 and b=1 is given by

E(Yi(k)|𝕏(k))=E(Yi(k)|𝕏(k))+E(Zi(k)|𝕏(k))E(Wi(k)|𝕏(k))=(1π)E(Yi(k)|𝕏(k))+λ=(1π)exp(Xi(k)w)+λ (30)

for all k𝒜I. In the presence of two unknown parameters π and λ, we estimate π and λ iteratively. Specifically, we set the initial value π=0 in (30), and apply (29) to obtain the estimator, which is denoted by λ^. Replacing λ in (30) by λ^ gives that

E(Yi(k))=(1π)exp(Xi(k)w)+λ^.

Following the discussion in (27), we obtain the following recursive form:

π(t+1)=1k{0}𝒜I(Y(k)λ^)k{0}𝒜I1nki=1nkexpXi(k)w^(t) (31)

for t=1, and with the initial value w(0)=0p. Then following the similar discussion of (27) yields the estimator π^.

Given the estimator of π and/or λ, one may implement them to (21) so that the estimator of δ can be obtained, and thus, the estimator of β can be derived.

5. Model Averaging

Different models may yield varying predictions on the same dataset, and no single model is universally perfect. To address model uncertainty, we employ a model averaging approach, which aims to produce more reliable estimators and accurate prediction by combining multiple models.

Let p denote the number of nonzero elements in the estimator of w derived in (13) or (20), depending on the usage of informative or all source data. The value p is determined by the optimal tuning parameter derived by the cross‐validation method in Section 3.3. The p variables produce S=2p1 combinations of variables with the null set removed, yielding S candidate models in the regression model.

Let βm denote the parameter under the mth candidate model for m=1,,S. Moreover, according to the discussion in Section 3.2.1, the contrast between βm and ω^𝒜I is denoted by δmβmω^𝒜I, and the resulting estimator, denoted as δ^m𝒜I, is derived by (21) under the target data and the mth candidate model for m=1,,S. It follows from (15) that β^m𝒜Iω^𝒜I+δ^m𝒜I is the estimator of βm for m=1,,S. Then the model averaging estimator is denoted as

β^ξm=1Sξmβ^m𝒜I=w^𝒜I+m=1Sξmδ^m𝒜I, (32)

where ξ(ξ1,,ξS) is a vector of weights in a set 𝒲={ξ[0,1]S:m=1Sξm=1}. (32) basically says that the model averaging handles the model uncertainty for the target data and applies in the debiasing term. To determine ξ, we consider two approaches. The first method is the BIC‐based strategy (e.g., Chen and Yi [17]; Zou et al. [16]). For the mth model, the BIC value is formulated by

BICm=2·log(Lm)+pm·log(n0)

for m=1,,S, where log(Lm) is the log likelihood function of evaluated at (8) with covariates in the mth candidate model, and pm is the number of covariates in the mth model. Then the resulting weight ξ is given by

ξ^bic=m=1Sexp(BICm)21·exp(BIC1)2,exp(BIC2)2,,exp(BICS)2. (33)

Instead of specifying the weight ξ, the second method is to find the optimal weight from the objective function (e.g., Zou et al. [16]). Specifically, let the negative log‐likelihood function be Ψ(ξ)(β^ξ;𝒟0). The optimal weight is then determined by

ξ^opt=argminξ𝒲Ψ(ξ). (34)

Note that the optimization of (34) is subject to a constraint m=1Sξm=1. To solve this constrained minimization problem, we adopt an adaptive Barrier algorithm (e.g., Boyd and Vandenberghe [23], Section 11.3; Lange [24], Section 16.3) to solve the minimization, which can be implemented by the R function constrOptim.

Consequently, the model averaging estimator is given by (32) with ξ replaced by (33) or (34).

6. Numerical Studies

6.1. Simulation Setup

In this section, we conduct simulation studies to assess the performance of the proposed method. Let p=500,1000 denote the dimension of covariates. Let n0=200,400 be sample sizes of target data, and specify nk=100,200 as sample sizes of the kth source data for k=1,,K. We consider the following two scenarios to generate the unobserved target data 𝒟0={(Yi(0),Xi(0)):i=1,,n0} and unobserved source data 𝒟k={(Yi(k),Xi(k)):i=1,,nk} for k=1,,K. In our study, we specify K=15.

Scenario I

Let s=5 denote the number of elements that are nonzero. Let β=(0.5·1s,0ps) and β2=(0.7·1s,0ps) denote the vectors of true value of parameters, where 1s is the s‐dimensional vector with all elements being 1 and 0ps is the (ps)‐dimensional vector with all elements being 0. For the covariates in the target data, we generate random samples Xi(0) from the normal distribution with mean 0p and covariance matrix =[jj]p×p with jj=0.5|jj| for j,j=1,,p, and i=1,,n0. In addition, we generate the covariates in source data Xi(k) from the normal distribution with mean 0p and covariance matrix as +0.32Ip for i=1,,nk and k=1,,K, where Ip is the p×p identity matrix. We then generate the response for the target and source data by the following models:

Yi(0)Poissonμi=expXi(0)β,fori=1,,n0,Yi(k)Poissonμi=expXi(k)β,fori=1,,nk,k=1,,10,Yi(k)Poissonμi=expXi(k)β2,fori=1,,nk,k=11,,K.

Given Yi(k), we further apply (3) with three cases in Section 3.1 to generate Yi(k), where Zi(k) follows Poisson distribution with mean λ=2,4,6, and the subtracted term Wi(k) follows binomial distribution with size Yi(k) and probability π=0.3,0.5,0.7 for i=1,,nk, and k=0,,K. In this scenario, π and λ are known.

Scenario II

In this scenario, we treat measurement error parameters as unknown values and implement the method in Section 4 to estimate λ, π and β. All settings follow Scenario I except that Xi(k) is generated from the uniform distribution over the interval [1,1] for i=1,,nk and k=0,,K.

Our simulation studies are designed to achieve three primary goals. First, with the availability of error‐prone source data, our aim is to accurately identify informative source data. Second, we correct for measurement errors effects and obtain precise estimators under the Poisson regression models. Third, we examine the prediction error to see the accuracy of prediction.

6.2. Simulation Results

Given the synthetic dataset in Section 6.1, we implement the transfer learning and model averaging methods in Sections 3 and 5 with (33) and (34) to derive the estimator, which are denoted by “TL,” “BIC,” and “optimal,” respectively. To compare the performance with the proposed method, we further examine the naive method, which uses the error‐prone response in the estimation procedure. To see the impact of source data, we examine the proposed and naive methods by using all source data.

To assess the performance of estimator of β, we record the L2‐norm for the bias of β^:

L2=β^β22,

where v22=v12++vp2 for some vector v=(v1,,vp). In addition, to assess the prediction performance for new observations, we record the mean squared prediction error (MSPE) and the sample version of the relative KL divergence (SRKL). Specifically, we randomly split the target data into an 80% training set 𝒯 and a 20% validation set 𝒱{Xnew,i,Ynew,i}:i𝒱. We first apply the proposed method in Sections 3 and 5 to obtain β^ based on 𝒯, then compute the MSPE and SRKL under the data 𝒱, which are respectively defined as

MSPE=1|𝒱|i𝒱(Ynew,iμ^i)2

and

SRKL=1|𝒱|i𝒱Ynew,iXnew,iβ^μ^ilog{(Ynew,i𝕀(Ynew,i0))!},

where 𝕀(·) denotes the indicator function, Ynew,i=Ynew,iλ1π refers to “corrected” response with two parameters λ and π being either known or estimated by Section 4, and μ^i=exp(Xnew,iβ^) is the fitted mean function of the Poisson distribution. Smaller values of MSPE and SRKL divergence indicate better prediction. Due to the limited space, numerical results for one of settings under Scenario I is presented in Table 1; the results for the remaining settings are placed in Tables A.1 to A.23 in the Supporting Information. For the estimator of β, it is obvious to see that the proposed method has a relatively small bias regardless of different sample sizes, dimension p, and magnitudes of measurement error under Cases 1–3 in Section 2.2, indicating that the proposed method is valid for handling measurement error and detect informative source data. Moreover, when λ and/or π are unknown and are estimated by the source data, numerical results in the Supporting Information show that the estimators of β still have the satisfactory performance, which implies that λ and π are also precisely estimated. In contrast, we observe that the biases would be produced when measurement error effects were ignored and/or noninformative source data are falsely included. On the other hand, we also find that the model averaging estimators derived by either the BIC or optimal weights are comparable to or better than the transfer learning estimators with relatively smaller biases if measurement error is corrected and informative source data are incorporated. It verifies from results in Table 1 and the Supporting Information that the proposed method is valid for handling the estimation of β regardless of various combinations of a and b in measurement error models (3) and the corresponding parameters λ and π are given or unknown.

TABLE 1.

Simulation results of Case 1 with known measurement error (π=0.3) in response.

Optimal BIC TL
p
n0
nk
Informative Correct L2‐norm MSPE SRKL L2‐norm MSPE SRKL L2‐norm MSPE SRKL
500 200 100 Yes Yes 0.001 3.822 1.374 0.001 3.927 1.375 0.001 3.822 1.374
No 0.016 25.621 1.512 0.015 25.182 1.507 0.016 25.621 1.512
No Yes 0.001 3.775 1.375 0.051 63.030 1.504 0.001 3.775 1.375
No 0.018 27.751 1.526 0.065 13.875 1.529 0.018 27.751 1.526
200 Yes Yes 0.001 3.916 1.325 0.000 3.936 1.325 0.001 3.916 1.325
No 0.016 23.060 1.471 0.015 22.627 1.467 0.016 23.060 1.471
No Yes 0.001 3.970 1.325 0.582 19.362 1.433 0.001 3.972 1.325
No 0.015 22.309 1.465 0.726 13.527 1.448 0.015 22.309 1.465
400 100 Yes Yes 0.007 4.416 1.392 0.001 4.393 1.392 0.007 4.416 1.392
No 0.015 41.961 1.589 0.015 41.436 1.586 0.015 41.961 1.589
No Yes 0.001 4.441 1.392 0.052 51.114 1.518 0.001 4.441 1.392
No 0.015 40.907 1.585 0.067 19.753 1.579 0.015 40.907 1.585
200 Yes Yes 0.000 4.359 1.360 0.000 4.359 1.360 0.000 4.359 1.360
No 0.014 31.639 1.515 0.014 30.639 1.515 0.014 30.639 1.515
No Yes 0.001 4.480 1.360 0.081 44.944 1.583 0.001 4.480 1.360
No 0.014 31.093 1.518 0.095 18.954 1.538 0.014 31.093 1.518
1000 200 100 Yes Yes 0.001 4.196 1.408 0.000 4.180 1.408 0.001 4.196 1.408
No 0.016 21.220 1.565 0.016 21.093 1.563 0.016 21.220 1.565
No Yes 0.001 4.133 1.409 0.037 9.838 1.485 0.001 4.133 1.409
No 0.015 20.688 1.557 0.051 18.214 1.537 0.015 20.688 1.557
200 Yes Yes 0.005 4.564 1.368 0.005 4.835 1.368 0.005 4.565 1.368
No 0.020 33.059 1.525 0.019 31.796 1.517 0.020 33.059 1.525
No Yes 0.005 5.090 1.368 0.068 30.418 1.520 0.005 5.092 1.368
No 0.019 32.151 1.520 0.082 18.063 1.513 0.019 32.151 1.520
400 100 Yes Yes 0.001 3.535 1.378 0.000 3.539 1.378 0.001 3.535 1.378
No 0.018 26.492 1.533 0.017 26.168 1.530 0.018 26.492 1.533
No Yes 0.001 3.547 1.378 0.070 19.183 1.521 0.001 3.548 1.378
No 0.018 26.520 1.534 0.086 17.633 1.535 0.018 26.520 1.534
200 Yes Yes 0.001 3.557 1.375 0.001 3.557 1.375 0.001 3.557 1.375
No 0.019 43.328 1.558 0.019 43.328 1.558 0.019 43.328 1.558
No Yes 0.001 3.876 1.377 0.076 21.707 1.518 0.001 3.872 1.377
No 0.021 45.343 1.570 0.094 27.105 1.543 0.021 45.343 1.570

Note: In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Correct column, “Yes” is the correction of error‐prone responses and “No” represents the naive method without measurement error correction. “BIC,” “optimal,” and “TL” are (33), (34), and transfer learning methods, respectively.

Finally, to see the performance of prediction, we observe from Table 1 and the Supporting Information that the transfer learning method and model averaging method under either the BIC or optimal weights produce precise prediction with relatively small values of MSPE and SRKL under various values of λ and π. In contrast, it is unsurprising to see that the performance of prediction looks worse if measurement error and/or informative source data selection are ignored in the estimation procedure, even if prediction is derived by the model averaging method. In summary, numerical results verify the importance of dealing with measurement error effects and show that the proposed method is valid for handling such a complex situation.

7. Real Data Analysis

7.1. Data Description

We apply the proposed methods to analyze the breast cancer data collected by METABRIC. The dataset is available by using the command cBioDataPack(brca_mbcproject_2022) in the R package cBioPortal. This target dataset contains 1980 subjects and 12,132 common standardized mRNA variables, which serve as the intermediary between DNA and proteins, carry the genetic code copied from DNA during transcription and provide the template for protein synthesis during translation. In addition, this dataset also records nine different genes, including ERBB2, CCND1, MYC, FGFR1, PIK3CA, TP53, BRCA1, BRCA2, and CDH1. Each gene records CNA with possible outcomes {2,1,0,1,2}: positive (negative) values reflect increase (decrease) of copy number for a region or a gene, yielding the higher risk in the breast cancer (e.g., Baslan et al. [25]). Among nine genes, we record the number of nonzero CNAs for each subject, and treat it our primarily interested variable, denoted as Yi(0) (e.g., Turner et al. [26]; Olivier et al. [27]). To characterize Yi(0), we use mRNA as the covariates Xi(0), and then employ the Poisson regression model for Yi(0) and Xi(0).

In addition to the target data, we also collect seven different source data, which contain the same variables as those in the target data, from various references, including Metastatistic Breast Cancer Project (MBC project), Clinical Proteomic Tumor Analysis Consortium (CPTAC) and The Cancer Genome Atlas (TCGA) for Cancer Genomics website. The detailed descriptions of the source data are summarized in Table 2. Let Yi(k) and Xi(k) denote the corresponding response and covariates in the kth source data for i=1,,nk and k=1,,7.

TABLE 2.

A list of seven source data with the corresponding sample sizes nk for real data analysis.

k
Name
nk
1 The Metastatic Breast Cancer Project (Provisional, December 2021) 156
2 The Metastatic Breast Cancer Project (Archived, 2020) 146
3 Proteogenomic landscape of breast cancer (CPTAC, Cell 2020) 122
4 Breast Invasive Carcinoma (TCGA, Cell 2015) 421
5 Breast Invasive Carcinoma (TCGA, Nature 2012) 499
6 Breast Invasive Carcinoma (TCGA, PanCancer Atlas) 1068
7 Breast Invasive Carcinoma (TCGA, Firehose Legacy) 513

However, in applications, the values of CNA in both target and source data may be susceptible to measurement errors. Specifically, as discussed in the past literature (e.g., Zhang and Zhang [5]; Chiang et al. [6]; Grasso et al. [7]), CNA values are possibly subject to measurement error due to various situations. Thus, we denote Yi(k) as the observed response, which is taken as the surrogate for Yi(k) for k=0,1,,7. In our study, we consider the general formulation a=b=1 in the model (3) to characterize the error‐prone responses.

Moreover, among the amount of mRNA and various source data in Table 2, it is expected that a part of mRNAs and source data are informative to the CNAs. Naively using all mRNAs and source data or ignoring measurement error effects may affect the accuracy of parameter estimation and prediction. To address those issues and provide reliable analysis, we employ the proposed method to tackle those challenges.

7.2. Analysis Results

The selection criterion in Section 3.2.2 gives 𝒜I={1,2,5,6}, where the numbers in 𝒜I are labels associated with the source data in Table 2. With the informative source datasets accommodated, we follow the strategy in Section 6.2 to split the data into the training set and the validation set, and then apply the estimation method in Section 4 to derive the estimators of π and λ in the measurement error model (3) under the training data. We place the resulting estimators in Table 3. Given two estimators of π and λ, we use the transfer learning method in Section 3 to derive the estimator of β under the training data. To see the impact of measurement error and/or the inclusion of noninformative source data, we examine the other competitive methods mentioned in Section 6.2. The numerical results are summarized in Table 4. Finally, we use the estimators in Table 4 and covariates in the validation data to compute the predicted mean function, and then compute the MSPE and SRKL. In addition, we examine the model averaging method in Section 5 to eliminate the uncertainty of model selection. Numerical results are placed in Table 5.

TABLE 3.

Real data analysis for the estimation of λ and π.

Informative Method
π^
λ^
Yes MA 0.48 3
TL 0.40 3
No MA 0.24 3
TL 0.31 3

Note: In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Method column, “MA” and “TL” are the model averaging and transfer learning methods, respectively. λ^ and π^ in the TL row are obtained by the method in Section 4; λ^ and π^ in the MA row are obtained by the method in Section 4 with arithmetic mean.

TABLE 4.

Real data analysis for the estimation of β and mRNA selection with λ and π estimated in Table 3.

Informative Correct Method CINP OSBPL2 THRB SLC5A11 ACOT12 CRABP1 SDS
Yes Yes Optimal 0.254 0.175 −0.242 0.127 −0.107
BIC 0.240 0.164 −0.233 0.117
TL 0.254 0.175 −0.242 0.127 −0.107
No Optimal 0.272 0.175 −0.123 0.313
BIC 0.269 0.180 −0.120 0.318
TL 0.272 0.175 −0.123 0.313
No Yes Optimal −0.167 −0.105
BIC −0.417
TL −0.433 −0.153
No Optimal −0.160
BIC −0.168
TL −0.160

Note: In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Correct column, “Yes” is the correction of error‐prone responses and “No” represents the naive method without measurement error correction. In the Method column, “BIC,” “optimal,” and “TL” are (33), (34), and transfer learning methods, respectively. Remaining columns are mRNAs, where real values are the corresponding estimators and blanks represent unselected mRNAs.

TABLE 5.

Real data analysis prediction results with λ and π estimated in Table 3.

Optimal BIC TL
Informative Correct MSPE SRKL MSPE SRKL MSPE SRKL
Yes Yes 3.803 1.455 3.861 1.612 3.804 1.784
No 4.016 2.462 4.023 2.463 4.016 2.462
No Yes 4.627 1.680 4.545 1.793 4.656 1.843
No 4.644 2.463 4.639 2.465 4.644 2.463

Note: In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Correct column, “Yes” is the correction of error‐prone responses and “No” represents the naive method without measurement error correction. “BIC,” “optimal,” and “TL” are (33), (34), and transfer learning methods, respectively.

In Table 3, we observe that the estimator of λ is always equal to 3 regardless of the usage of informative source data, while the estimator of π is larger when informative source data are implemented. The possible reason making this difference is due to that noninformative source data are falsely included in the iterations (29) and (31). With the estimators λ^ and π^ equipped, we observe from Table 4 that the corrected TL method selects five mRNAs, including CINP, OSBPL2, THRB, SLC5A11, and CRABP1. While these gene expressions were discussed in various research topics separately (e.g., Lovejoy et al. [28]; Wang et al. [29]; Davidson et al. [30]; Tsai et al. [31]; Liu et al. [32]), the unique finding in our study is that these gene expressions are simultaneously selected when using informative source data. In contrast, when all source data are used in the estimation procedure, the resulting selected variables are entirely different, except for CRABP1. In addition, without the correction of measurement error, we can see that the set of mRNAs selected by the naive method is the subset of that determined by the correction approach. From the estimates in Table 4, we find that the TL and MA methods produce similar values. Finally, Table 5 shows that, when measurement error correction and informative source data are implemented, MSPE and SRKL achieve the smallest values compared to other settings. This result not only reflects that the selected gene expressions (CINP, OSBPL2, THRB, SLC5A11, and CRABP1) are key factors in affecting CNA values for breast cancer, but also verifies the impact of measurement error and the importance of using informative source data.

In addition to estimating the parameters λ and π, we provide an alternative approach, called sensitivity analyses, to examine the impact of various magnitudes of measurement error effects. Specifically, similar to the settings in simulation studies, we specify the parameters (λ,π) by (2,0.3),(4,0.5) or (6,0.7), which reflect minor, moderate, and severe measurement error effects, respectively. Given λ and π, we run the estimation procedures in Sections 3 and 5 to derive the transfer learning and model averaging estimators, respectively. Numerical results of variable selection and estimation are placed in Table 6. When measurement error is corrected and informative source data are included, Table 6 shows that the selected mRNAs are the same as those determined in Table 4 regardless of the changes of λ and π. When all source data are used, the mRNA ACOT12 is commonly selected, but it is interesting to see that additional mRNAs CRABP1 and SDS are selected when π and λ increase, which may indicate that mRNAs CRABP1 and SDS can be selected when using all source data with large values of π and λ. On the other hand, regardless of the usage of all source data or informative ones, the absolute values of estimators from sensitivity analyses are larger than those obtained using the estimated λ^ and π^, except for the estimator of ACOT12 under the minor level measurement error effect (λ,π)=(2,0.3). This phenomenon reflects that the magnitudes of measurement error λ and π may affect the estimator when the correction is taken into account.

TABLE 6.

Real data analysis for the estimation of β and mRNA selection by specifying values λ and π.

Sensitivity Informative Methods CINP OSBPL2 THRB SLC5A11 ACOT12 CRABP1 SDS
Minor Yes Optimal 0.367 0.242 −0.297 0.314
BIC 0.367 0.241 −0.302 0.319
TL 0.367 0.242 −0.297 0.314
No Optimal −0.158
BIC −0.374
TL −0.361
Moderate Yes Optimal 0.463 0.301 −0.544 0.251 −0.237
BIC 0.471 0.290 −0.566 0.262
TL 0.463 0.301 −0.544 0.251 −0.237
No Optimal −0.182 −0.155
BIC −0.636
TL −0.611 −0.261
Severe Yes Optimal 0.589 0.352 −0.886 0.126 −0.444
BIC 0.612 0.326 −0.959 0.146
TL 0.589 0.352 −0.886 0.126 −0.444
No Optimal −0.312 −0.265 −0.182
BIC −0.974
TL −0.916 −0.504

Note: In the Sensitivity column, “Minor” represents (λ,π)=(2,0.3), “Moderate” is (λ,π)=(4,0.5), and “Severe” is (λ,π)=(6,0.7). In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Method column, “BIC”, “optimal”, and “TL” are (33), (34), and transfer learning methods, respectively. Remaining columns are mRNAs, where real values are the corresponding estimators and blanks represent unselected mRNAs.

Finally, we compute MSPE and SRKL based on three different levels of measurement error and summarize the numerical results in Table 7. We first observe that the MSPE and SRKL values derived by the informative source data are generally smaller than those under all source data, which is similar to the findings in Table 5. When the magnitudes of measurement error increase, the MSPE and SRKL values become large. In particular, when (λ,π) is specified as (6,0.7), it is surprising to see that the MSPE and SRKL values are larger than the values without measurement error correction summarized in Table 5. This phenomenon indicates that the count response in this dataset may be simply subject to minor measurement error effect, such as values in Table 3 or a pair (λ,π)=(2,0.3), whose corresponding correction produces relatively precise prediction. In contrast, falsely specifying larger values of λ and π can result in an unexpected result.

TABLE 7.

Real data analysis for the estimation of β and mRNA selection by specifying values λ and π.

Optimal BIC TL
Sensitivity Informative MSPE SRKL MSPE SRKL MSPE SRKL
Minor Yes 3.758 1.996 3.760 1.998 3.758 1.996
No 4.546 2.038 4.544 2.048 4.645 2.038
Moderate Yes 4.117 1.428 4.670 1.856 5.117 2.028
No 4.610 1.515 4.943 1.964 5.856 2.170
Severe Yes 11.927 1.716 9.021 2.959 12.912 3.716
No 14.712 1.893 11.338 2.975 16.395 3.986

Note: In the Sensitivity column, “Minor” represents (λ,π)=(2,0.3), “Moderate” is (λ,π)=(4,0.5), and “Severe” is (λ,π)=(6,0.7). In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Correct column, “Yes” is the correction of error‐prone responses and “No” represents the naive method without measurement error correction. “BIC”, “optimal”, and “TL” are (33), (34), and transfer learning methods, respectively.

8. Summary

In this article, we discuss the transfer learning methods for modeling count responses, which is motivated by the breast cancer data with the availability of gene expression variables and various source information. The main challenge of the dataset is the measurement error in the response and high‐dimensional covariates, which may affect the prediction and estimation performance. To tackle these challenges, we introduce measurement error models to characterize error‐prone count response and propose the insertion method to correct for measurement error effects. Our correction strategy in Section 3.1 simply adopts the first moment of the noise terms Zi(k) and Wi(k); instead, the specific distributions are used to satisfy the implicit restriction of the error‐prone response in (3). With the availability of source data, we develop an estimation procedure to estimate parameters in measurement error models, which is different from the conventional implementation of repeated measurements or validation data. To detect informative source data, we follow Tian and Feng [4] to construct the selection criterion based on error‐in‐response data. As suggested by a referee, the next research direction is to further explore a more accurate upper bound in (17) by analyzing L^0(k) and L^0(0) with measurement error correction taken into account, so that the detection of informative source data can be improved. In addition to the transfer learning method, we further develop the model averaging procedure to improve the accuracy of prediction and avoid the model uncertainty. Simulation studies and analysis of breast cancer data reveal satisfactory performance of the proposed method and verify the importance of correcting for measurement error. As explored by Tian and Feng [4], the transfer learning strategy is valid for deriving the consistent estimator for the GLM. Being a special case of GLM, we may expect that the proposed estimator for the Poisson regression model has the consistency property, especially when the measurement error effects are corrected through the likelihood function. In addition to numerical justification, it is worth exploring theoretical properties in the near future.

There are some possible extensions. For example, a general and complex situation is the mismeasurement of both response and covariates. While the correction of error‐prone covariates has been widely discussed, the impact on the source data detection is unexplored. In addition to the insertion method to correct for measurement error, there are alternative approaches, such as the regression calibration or simulation and extrapolation methods (e.g., Chen [13]), but these methods primarily focus on error‐prone continuous covariates and rare discussion is for the response or discrete variables. It is worth extending other measurement error strategies to handle error‐prone responses and/or covariates. Finally, we comment that the current development lies in the measurement error model (3) and some specific distributions with implicit restrictions are considered for the error terms in the measurement error model. In applications, the measurement error model and the corresponding distributions of the noise terms are usually unknown. To address these issues and examine the validity of the model and distribution, additional auxiliary information with the unobserved response Yi(k) available, such as external or internal validation data (e.g., Carroll et al. [8]), is required since measurement error models involve the unobserved response Yi(k). Moreover, with the availability of validation data, one may further examine the homogeneity or heterogeneity of the parameters in (3), that is, H0:λ(k)=λ(k) and H0:π(k)=π(k) for k,k=0,1,,K with kk, and then estimate each heterogeneous parameter λ(k) and π(k) by the validation data if H0 is rejected. While these issues are not main concerns in the current development, they deserve further exploration in our future work.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Data S1. Supporting Information.

SIM-44-0-s001.pdf (181KB, pdf)

Acknowledgments

The authors thank the Editor, an Associate Editor, and three referees for their useful comments that significantly improved the initial manuscript. Chen's research was supported by National Science and Technology Council.

Wu J.-C. and Chen L.-P., “Transfer Learning for Error‐Contaminated Poisson Regression Models,” Statistics in Medicine 44, no. 15‐17 (2025): e70163, 10.1002/sim.70163.

Funding: This work was supported by the National Science and Technology Council (Grant No. 112‐2118‐M‐004‐005‐MY2).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • 1. Bastani H., “Predicting With Proxies: Transfer Learning in High Dimension,” Management Science 67 (2021): 2964–2984. [Google Scholar]
  • 2. Chen A., Owen A. B., and Shi M., “Data Enriched Linear Regression,” Electronic Journal of Statistics 9 (2015): 1078–1112. [Google Scholar]
  • 3. Li S., Cai T. T., and Li H., “Transfer Learning for High‐Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 84 (2022): 149–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Tian Y. and Feng Y., “Transfer Learning Under High‐Dimensional Generalized Linear Models,” Journal of the American Statistical Association 118 (2023): 2684–2697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Zhang L. and Zhang L., “Use of Autocorrelation Scanning in DNA Copy Number Analysis,” Bioinformatics 29 (2013): 2678–2682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Chiang D. Y., Getz G., Jaffe D. B., et al., “High‐Resolution Mapping of Copy‐Number Alterations With Massively Parallel Sequencing,” Nature Methods 6 (2009): 99–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Grasso C., Butler T., Rhodes K., et al., “Assessing Copy Number Alterations in Targeted, Amplicon‐Based Next‐Generation Sequencing Data,” Journal of Molecular Diagnostics 17 (2015): 53–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Carroll R. J., Ruppert D., Stefanski L. A., and Crainiceanu C. M., Measurement Error in Nonlinear Model (CRC Press Chapman and Hall, 2006). [Google Scholar]
  • 9. Kukush A., Schneeweis H., and Wolf R., “Three Estimators for the Poisson Regression Model With Measurement Errors,” Statistical Papers 45 (2004): 351–368. [Google Scholar]
  • 10. Sørensen Ø., Hellton K. H., Frigessi A., and Thoresen M., “Covariate Selection in High‐Dimensional Generalized Linear Models With Measurement Error,” Journal of Computational and Graphical Statistics 27 (2018): 739–749. [Google Scholar]
  • 11. Yang H., Pan F., Tong D., Brown H. E., and Liu J., “Measurement Error‐Tolerant Poisson Regression for Valley Fever Incidence Prediction,” IISE Transactions on Healthcare Systems Engineering 14 (2024): 305–317, 10.1080/24725579.2024.2370243. [DOI] [Google Scholar]
  • 12. Jiang F., Zhou Y., Liu J., and Ma Y., “On High‐Dimensional Poisson Models With Measurement Error: Hypothesis Testing for Nonlinear Nonconvex Optimization,” Annals of Statistics 51 (2023): 233–259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Chen L.‐P., “De‐Noising Boosting Methods for Variable Selection and Estimation Subject to Error‐Prone Variables,” Statistics and Computing 33 (2023): 38. [Google Scholar]
  • 14. Chen L.‐P. and Qiu B., “SIMEXBoost: An R Package for Analysis of High‐Dimensional Error‐Prone Data Based on Boosting Method,” R Journal 15 (2023): 5–20. [Google Scholar]
  • 15. Zhang Q. and Yi G. Y., “Zero‐Inflated Poisson Models With Measurement Error in the Response,” Biometrics 79 (2023): 1089–1102. [DOI] [PubMed] [Google Scholar]
  • 16. Zou J., Wang W., Zhang X., and Zou G., “Optimal Model Averaging for Divergent‐Dimensional Poisson Regressions,” Econometric Reviews 41 (2022): 775–805. [Google Scholar]
  • 17. Chen L.‐P. and Yi G. Y., “Model Selection and Model Averaging for Analysis of Truncated and Censored Data With Measurement Error,” Electronic Journal of Statistics 14 (2020): 4054–4109. [Google Scholar]
  • 18. Zhang X., Ma Y., and Carroll R. J., “MALMEM: Model Averaging in Linear Measurement Error Models,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 81 (2019): 763–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Nakamura T., “Proportional Hazards Model With Covariates Subject to Measurement Error,” Biometrics 48 (1992): 829–838. [PubMed] [Google Scholar]
  • 20. Friedman J., Hastie T., and Tibshirani R., “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software 33 (2010): 1–22. [PMC free article] [PubMed] [Google Scholar]
  • 21. Klosa J., Simon N., Westermark P. O., Liebscher V., and Wittenburg D., “Seagull: Lasso, Group Lasso and Sparse‐Group Lasso Regularization for Linear Regression Models via Proximal Gradient Descent,” Bioinformatics 21 (2020): 407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Lee S., Kwon S., and Kim Y., “A Modified Local Quadratic Approximation Algorithm for Penalized Optimization Problems,” Computational Statistics and Data Analysis 94 (2016): 275–286. [Google Scholar]
  • 23. Boyd S. and Vandenberghe L., Convex Optimization (Cambridge University Press, 2004). [Google Scholar]
  • 24. Lange K., Numerical Analysis for Statisticians (Springer, 2001). [Google Scholar]
  • 25. Baslan T., Kendall J., Volyanskyy K., et al., “Novel Insights Into Breast Cancer Copy Number Genetic Heterogeneity Revealed by Single‐Cell Genome Sequencing,” eLife 9 (2020): e51480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Turner N., Pearson A., Sharpe R., et al., “FGFR1 Amplification Drives Endocrine Therapy Resistance and is a Therapeutic Target in Breast Cancer,” Cancer Research 70 (2010): 2085–2094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Olivier M., Hollstein M., and Hainaut P., “TP53 Mutations in Human Cancers: Origins, Consequences, and Clinical Use,” Cold Spring Harbor Perspectives in Biology 2 (2010): a001008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Lovejoy C. A., Xu X., Bansbach C. E., et al., “Functional Genomic Screens Identify CINP as a Genome Maintenance Protein,” Proceedings of the National Academy of Sciences of the United States of America 106 (2009): 19304–19309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Wang P., Weng J., and Anderson R. G. W., “OSBP is a Cholesterol‐Regulated Scaffolding Protein in Control of ERK1/2 Activation,” Science 307 (2005): 1472–1476. [DOI] [PubMed] [Google Scholar]
  • 30. Davidson C. D., Gillis N. E., and Carr F. E., “Thyroid Hormone Receptor Beta as Tumor Suppressor: Untapped Potential in Treatment and Diagnostics in Solid Tumors,” Cancers (Basel) 13 (2021): 4254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Tsai L.‐J., Hsiao S.‐H., Tsai L.‐M., et al., “The Sodium‐Dependent Glucose Cotransporter SLC5A11 as an Autoimmune Modifier Gene in SLE,” Tissue Antigens 71 (2008): 114–126. [DOI] [PubMed] [Google Scholar]
  • 32. Liu R. Z., Garcia E., Glubrecht D. D., Poon H. Y., Mackey J. R., and Godbout R., “CRABP1 is Associated With a Poor Prognosis in Breast Cancer: Adding to the Complexity of Breast Cancer Cell Response to Retinoic Acid,” Molecular Cancer 14 (2015): 129. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1. Supporting Information.

SIM-44-0-s001.pdf (181KB, pdf)

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from Statistics in Medicine are provided here courtesy of Wiley

RESOURCES