Transfer Learning for Error‐Contaminated Poisson Regression Models

Jou‐Chin Wu; Li‐Pang Chen

doi:10.1002/sim.70163

. 2025 Jul 15;44(15-17):e70163. doi: 10.1002/sim.70163

Transfer Learning for Error‐Contaminated Poisson Regression Models

Jou‐Chin Wu ¹, Li‐Pang Chen ^1,^✉

PMCID: PMC12333914 PMID: 40662525

ABSTRACT

Poisson regression model has been a popular approach to characterize the count response and the covariates. With the rapid development of data collections, the additional source information can be easily recorded. To efficiently use the source data to improve the estimation under the original data, the transfer learning method is considered a strategy. However, challenging issues from the given datasets include measurement error and high‐dimensionality in variables, which are not well explored in the context of transfer learning. In this paper, we propose a novel strategy to handle error‐prone count responses and estimate the parameters in measurement error models by using the source data, and then employ the transfer learning method to derive the corrected estimator. Moreover, to improve the prediction and avoid the model uncertainty, we further establish the model averaging strategy. Simulation and breast cancer data studies verify the satisfactory performance of the proposed method and the validity of handling measurement error.

Keywords: error‐prone count variables, model averaging, prediction, variable selection

1. Introduction

In biomedical studies, one of the research interests is to use gene expressions or mRNA variables to characterize some specific measures for a specific cancer. The motivating example is the breast cancer data collected by Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). This dataset contains 12,132 mRNA variables and nine genes recorded by copy number alterations (CNA), which reflect the level of risk caused by genes for breast cancer. By treating the number of nonzero CNA among nine genes to characterize the risk of breast cancer as the count response, the research goal is to build up the relationship between the count response and mRNA by the Poisson regression model. Moreover, identifying informative mRNA variables for the count response may improve the accuracy of the prediction.

In addition to the dataset collected by METABRIC, which is referred to as the target data that we are primarily interested in, with the rapid development of data collection, researchers can easily collect datasets containing the same mRNA and CNA for various resources. We call additional datasets the source data; more details can be found in Section 7. Intuitively, one may wish to incorporate the source data with the target data to build up the Poisson regression, however, the resulting estimator for the Poisson regression may be opposite to what we expect. This is because some of the source datasets have different structures from the target data. Hence, to effectively adopt source data that are informative to the target data and estimate the informative parameters for the Poisson regression model, the transfer learning method is a useful approach in recent years, whose idea is to identify informative source data and train them with the target data to build up the regression model. In the literature, several transfer learning methods have been proposed for high‐dimensional data. For example, Bastani [1] proposed a two‐step transfer learning algorithm for high‐dimensional generalized linear models (GLM) with single‐source data. Chen et al. [2] proposed the data‐enriched model for linear and logistic regression with a single source of data incorporated. In the presence of multiple source data, Li et al. [3] proposed Trans‐Lasso, which is a data‐driven procedure for high‐dimensional linear regression. Tian and Feng [4] applied transfer learning to handle high‐dimensional GLM.

While the transfer learning methods have been widely discussed, the developments are based on an implicit and unreliable assumption that variables are precisely measured. In contrast, measurement error arises ubiquitously and inevitably. For example, the CNA values in the motivating example are possibly subject to measurement error due to experimental noise (e.g., Zhang and Zhang [5]), systematic biases (e.g., Chiang et al. [6]), or sample heterogeneity (e.g., Grasso et al. [7]). Using error‐prone variables in the Poisson regression model may incur tremendous bias for the estimator, as explored in some existing literature (e.g., Carroll et al. [8]). Hence, measurement error correction is crucial to derive the reliable estimator for the Poisson regression model. A large body of correction methods has also been developed in the past literature. For example, in the presence of measurement error in covariates, Kukush et al. [9] considered normally distributed error in covariates under the Poisson regression. Sørensen et al. [10] focused on the challenge of selecting relevant covariates that are subject to measurement error in high‐dimensional data. Yang et al. [11] proposed the quasi‐likelihood estimation method to deal with classical and/or Berkson measurement errors in covariates. Jiang et al. [12] considered the Poisson regression models with noisy high‐dimensional covariates, and proposed a penalized target function to correct estimation bias and examine the hypothesis test. Chen [13] and Chen and Qiu [14] developed the boosting algorithms to deal with measurement error and variable selection. In contrast, when the count responses are subject to measurement error, Zhang and Yi [15] introduced the Bayesian method to address measurement error and estimate the parameters for zero‐inflated Poisson regression models. However, existing methods were developed under one target data, but they may not be valid when multiple source data are available. Moreover, transfer learning methods for measurement error have not yet been explored.

In the presence of high‐dimensional data, such as mRNA in the motivating example, the best combination of variables can be identified, and one may adopt the selected variables to do further analysis, such as model fitting or prediction. However, the main concern of this approach is the model uncertainty since the true model is unknown, which may make the prediction imprecise. To tackle this issue, the model averaging method is a widely used strategy, whose key idea is to integrate candidate models by linear combinations with suitable weights. In the past literature, model averaging procedures have been widely discussed when variables are either precisely measured or subject to measurement error. To name a few, Zou et al. [16] proposed a model averaging method with optimal weight in Poisson regressions. Chen and Yi [17] proposed a method to deal with truncated or censored data by using the focus information criterion (FIC) with data contaminated by error. Zhang et al. [18] proposed a data‐driven optimal weight in linear measurement error models. However, existing model averaging methods are restricted to a given dataset, yet the model averaging for the transfer learning procedure has not been fully explored.

Motivated by the challenges in the breast cancer data, to derive a reliable estimator and precise prediction with the source data and measurement error correction taken into account, we propose the transfer learning method for error‐in‐response Poisson regression. Specifically, we first characterize the error‐prone count response and propose the insertion method to derive the modified log‐likelihood function. Next, we consider an iteratively two‐stage transfer learning algorithm with informative source data selection, and perform variable selection and estimation. Moreover, we further propose a valid method to estimate the unknown parameters in the noise terms of the measurement error model by the informative source data and extend the model averaging method to the transfer learning method. Through the numerical studies, the proposed method has satisfactory performance with precise estimator and prediction.

The remainder is organized as follows: In Section 2, we introduce the data structure and Poisson regression models. In addition, measurement error models for counted responses are introduced. In Section 3, we develop the corrected estimating function based on the Poisson regression model, and then employ the transfer learning method to do variable selection and estimation. In Section 4, we develop valid estimation methods to derive unknown parameters in measurement error models. In Section 5, we introduce the model averaging algorithm to improve prediction and model uncertainty. In Section 6, we conduct simulation studies to assess the performance of the proposed method. Moreover, we apply the proposed method to analyze the breast cancer data in Section 7. A general discussion is presented in Section 8.

2. Notation and Models

2.1. Data Structure and Regression Models

Let $X$ denote a $p$ ‐dimensional random vector and let $Y$ be a random variable with count value of our primary interest. To characterize $Y$ and $X$ , we follow the framework of GLM and link $Y$ and $X$ through a Poisson regression

\begin{align} P (Y | X) = \frac{\exp {- \exp (X^{⊤} β)} {\exp (X^{⊤} β)}^{Y}}{Y!} \end{align}

(1)

for all $Y = 0, 1, 2, \dots,$ where $β$ is a $p$ ‐dimensional vector of parameters of our interest that is possibly sparse in the sense that most components are zero and only a few are nonzero, reflecting that rare components in $X$ are informative to $Y$ .

To estimate $β$ , we consider the dataset with sample size $n_{0}$ , which is denoted as $𝒟_{0} ≜ \{{X_{i}^{(0)}, Y_{i}^{(0)}} : i = 1, \dots, n_{0}\}$ with $(X_{i}^{(0)}, Y_{i}^{(0)})$ being independent and identically distributed as $(X, Y)$ . In particular, we call $𝒟_{0}$ the target data. Let $𝕐^{(0)} = {(Y_{1}^{(0)}, Y_{2}^{(0)}, \dots, Y_{n_{0}}^{(0)})}^{⊤}$ denote the column vector of the target response $Y_{i}^{(0)}$ with size $n_{0}$ , and let $𝕏^{(0)} = {(X_{1}^{(0)}, X_{2}^{(0)}, \dots, X_{n_{0}}^{(0)})}^{⊤}$ . In the framework of GLM, we usually estimate $β$ by optimizing the following log‐likelihood function under $𝒟_{0}$ and (1):

\begin{align} ℓ (β; 𝒟_{0}) \propto \sum_{i = 1}^{n_{0}} \{Y_{i}^{(0)} X_{i}^{(0) ⊤} β - \exp (X_{i}^{(0) ⊤} β)\}, \end{align}

(2)

where $\propto$ reflects the ignorance of $\log (Y_{i}^{(0)}!)$ because of independence of $β$ . To the end, the first argument in the likelihood function $ℓ (\cdot; \cdot)$ indicates the parameter of the function, the second argument represents the dataset for the construction of the likelihood function. On the other hand, during the data collection or the sampling scheme, we sometimes record the auxiliary information that may have the same variables as those in the target data $𝒟_{0}$ . We call such auxiliary information source data. Specifically, suppose that there are $K$ source data, and let $𝒟_{k} = \{{X_{i}^{(k)}, Y_{i}^{(k)}} : i = 1, \dots, n_{k}\}$ denote the $k th$ source data with size $n_{k}$ for $k = 1, \dots, K$ . Let $𝕐^{(k)} = {(Y_{1}^{(k)}, Y_{2}^{(k)}, \dots, Y_{n_{k}}^{(k)})}^{⊤}$ denote the column vector and let $𝕏^{(k)} = {(X_{1}^{(k)}, X_{2}^{(k)}, \dots, X_{n_{k}}^{(k)})}^{⊤}$ for $k = 1, \dots, K$ .

While the existence of the source data $𝒟_{k}$ provides us with more information to improve the estimator of $β$ , a crucial issue is that some of the source data are not identical to the target data. Specifically, note that $β$ is the parameter of the Poisson regression for the target data and let $w^{(k)}$ denote the parameter that characterizes $𝕐^{(k)}$ and $𝕏^{(k)}$ as the Poisson regression model (1) for the $k th$ source data $𝒟_{k}$ for $k = 1, \dots, K$ . Moreover, define $δ^{(k)} ≜ β - ω^{(k)}$ as the $k th$ contrast between $β$ and $ω^{(k)}$ . Following the discussion in Li et al. [3], small values of $δ^{(k)}$ in the $L_{0}$ ‐ or $L_{1}$ ‐norm, say $‖ δ^{(k)} ‖_{q}$ for $q \in {0, 1}$ , indicate the similarity of the target data and the $k th$ source data. We call the collection of those source data that have smaller values of $‖ δ^{(k)} ‖_{q}$ and are “similar” to the target data the informative source data associated with the target data, and we wish to integrate the informative source data with the target data to improve the estimation of the parameter $β$ , where the improvement of the estimators will be justified by the biases in the $L_{2}$ ‐norm and some prediction criteria in simulation studies. Finally, the detection of informative source data is deferred to Section 3.2.2.

2.2. Measurement Error Model

In real data applications, the responses $Y_{i}^{(0)}$ and $Y_{i}^{(k)}$ for $k = 1, \dots, K$ in target/source data are possibly subject to measurement error, and thus, the responses $Y_{i}^{(k)}$ are often not observed. In contrast, the surrogate responses, denoted as $Y_{i}^{* (k)}$ , are observed.

To characterize two count‐valued random variables $Y_{i}^{(k)}$ and $Y_{i}^{* (k)}$ , we follow the similar idea in Zhang and Yi [15] to consider the following measurement error model

\begin{align} Y_{i}^{* (k)} = Y_{i}^{(k)} + a Z_{i}^{(k)} - b W_{i}^{(k)} for k = 0, 1, \dots, K, i = 1, \dots, n_{k}, \end{align}

(3)

where $a, b \in {0, 1}$ are the indicators reflecting the relationship between $Y_{i}^{(k)}$ and $Y_{i}^{* (k)}$ , $Z_{i}^{(k)}$ and $W_{i}^{(k)}$ are two different random variables that are independent of each other. To ensure that $Y_{i}^{* (k)}$ has positive integer values, $Z_{i}^{(k)}$ and $W_{i}^{(k)}$ are required to follow distributions with counted outputs, such as Poisson distributions or binomial distributions. When $(a, b) = (1, 0)$ , then (3) results in an increasing counted value, such that the surrogate response is greater than or equal to the unobserved one. It implies that $Z_{i}^{(k)}$ is independent of $Y_{i}^{(k)}$ and is reasonable to assume that $E (Z_{i}^{(k)}) = λ > 0$ . One of the possible specifications is the Poisson distribution $Z_{i}^{(k)} \sim Pois (λ)$ , keeping in mind that our development may not restrict itself to the Poisson distribution. On the other hand, when $(a, b) = (0, 1)$ , (3) leads to a decreased count from the unobserved response $Y_{i}^{(k)}$ . To ensure that $Y_{i}^{* (k)}$ is non‐negative, we require that the maximum value of $W_{i}^{(k)}$ is not greater than $Y_{i}^{(k)}$ . Therefore, the binomial distribution $W_{i}^{(k)} \sim$ Binomial( $Y_{i}^{(k)}, π$ ) with $E (W_{i}^{(k)}) = π Y_{i}^{(k)}$ and $π \in (0, 1)$ being a probability of success is one of the recommended distributions. Finally, $(a, b) = (1, 1)$ incorporates both “add‐in” and “leave‐out” errors, ensuring the mixture of addition and subtraction of the values induced by measurement error.

We comment on two indicators $a$ and $b$ . In applications, the choice of $a$ and $b$ can be suitably specified by the background information or researchers' experience. For example, if one believes that the observed responses are always greater (or smaller) than the unobserved ones, then $a = 1$ and $b = 0$ (or $a = 0$ and $b = 1$ ) can be implemented. Otherwise, we can consider the general situation $a = 1$ and $b = 1$ , which will be demonstrated for the real data analysis in Section 7. In the following development, we discuss these three cases separately.

The other notable situation in the measurement error model (3) is that we implicitly assume that $Z_{i}^{(k)}$ shares the homogeneous expectation $λ$ and $W_{i}^{(k)}$ has the common component $π$ in the expectation among the target ( $k = 0$ ) and source ( $k = 1, \dots, K$ ) data. Based on this setting, we can integrate the target and source data to estimate two common parameters $λ$ and $π$ ; detailed discussion is deferred to Section 4. In addition, common parameters $λ$ and $π$ enable us to detect informative source data, where details are provided in Section 3.2.2. In applications, a general situation is that two parameters $λ$ and $π$ are dependent on the target and source data, that is, $Z_{i}^{(k)} \sim$ Pois( $λ^{(k)}$ ) and $W_{i}^{(k)} \sim$ Binomial( $Y_{i}^{(k)}, π^{(k)}$ ) for $k = 0, 1, \dots, K$ with $λ^{(k)} \neq λ^{(k^{'})}$ and $π^{(k)} \neq π^{(k^{'})}$ for any $k \neq k^{'}$ . However, under this general setting, these parameters cannot be estimated by integrating the target and source data. One may require auxiliary information, such as external or internal validation data, in the target and source data.

Now and hereafter, we denote $𝒟_{k}^{*}$ as the observed target/source data, which are defined as $𝒟_{k}$ with $Y_{i}^{(k)}$ replaced by $Y_{i}^{* (k)}$ for $k = 0, 1, \dots, K$ .

3. Methodology

3.1. Correction of Measurement Error

In the presence of measurement error, naively using the error‐prone variables may lead to biased estimators and incorrect regression models in (2). Therefore, it is crucial to address measurement error.

Specifically, by the M‐estimation theory, it is known that $E (\frac{\partial ℓ (β; 𝒟_{0})}{\partial β}) = 0_{p}$ gives the solution $β_{0}$ that is the true value of the parameters, where $0_{p}$ is a $p$ ‐dimensional zero vector. However, this equality does no longer hold if $𝕐^{(0)}$ is replaced by $𝕐^{* (0)} ≜ {(Y_{1}^{* (0)}, \dots, Y_{n_{0}}^{* (0)})}^{⊤}$ . Consequently, it motivates us to derive a new log‐likelihood function $ℓ^{*} (β; 𝒟_{0}^{*})$ based on $𝕐^{* (0)}$ , such that

\begin{align} E \{ℓ^{*} (β; 𝒟_{0}^{*}) | 𝕐^{(0)}, 𝕏^{(0)}\} = ℓ (β; 𝒟_{0}) \end{align}

(4)

holds. Taking the derivative of both sides in (4) with respect to $β$ gives

\begin{align} E (\frac{\partial ℓ^{*} (β; 𝒟_{0}^{*})}{\partial β} | 𝕐^{(0)}, 𝕏^{(0)}) = \frac{\partial ℓ (β; 𝒟_{0})}{\partial β}, \end{align}

provided that the expectation and the differentiation are interchangeable. It suggests that $β_{0}$ satisfies $E (\frac{\partial ℓ^{*} (β_{0}; 𝒟_{0}^{*})}{\partial β}) = E (\frac{\partial ℓ (β_{0}; 𝒟_{0})}{\partial β}) = 0_{p}$ , or equivalently,

\begin{align} β_{0} ≜ \underset{β}{argmax} E \{ℓ^{*} (β; 𝒟_{0}^{*})\} = \underset{β}{argmax} E \{ℓ (β; 𝒟_{0})\} . \end{align}

(5)

Then the solution to $\frac{\partial ℓ^{*} (β; 𝒟_{0}^{*})}{\partial β} = 0_{p}$ gives the maximum likelihood estimator (MLE) of $β$ , denoted as $\tilde{β}$ . By the theory of maximum likelihood estimation, $\tilde{β}$ is the consistent estimator of $β_{0}$ , suggesting that $ℓ^{*} (β; 𝒟_{0}^{*})$ is the “corrected” log‐likelihood function and provides the corrected estimator. This approach follows the idea of the insertion method in measurement error analysis (e.g., Chen and Yi, [17]; Nakamura [19]; Carroll et al. [8], Chapter 7) that aims to find the “corrected” estimating function, such that its conditional expectation can recover to the function under the unobserved random variable. Unlike existing methods that focus on the correction of error‐prone covariates, the current development is the correction of error‐prone count response.

We start our derivation by $ℓ (β; 𝒟_{0}^{*})$ . Taking the conditional expectation gives that

\begin{align} E \{ℓ (β; 𝒟_{0}^{*}) | 𝕐^{(0)}, 𝕏^{(0)}\} \\ \propto \sum_{i = 1}^{n_{0}} \{E (Y_{i}^{* (0)} | 𝕐^{(0)}, 𝕏^{(0)}) X_{i}^{(0) ⊤} β - \exp (X_{i}^{(0) ⊤} β)\} \end{align}

(6)

where “ $\propto$ ” reflects the ignorance of $\log (Y_{i}^{(0)}!)$ because of independence of $β$ . We observe that (6) contains the conditional expectation $E (Y_{i}^{* (0)} | 𝕐^{(0)}, 𝕏^{(0)})$ , which should be derived by (3). Suppose that $E (Y_{i}^{* (0)} | 𝕐^{(0)}, 𝕏^{(0)})$ can be derived as

\begin{align} E (Y_{i}^{* (0)} | 𝕐^{(0)}, 𝕏^{(0)}) = C_{1} Y_{i}^{(0)} + C_{2} \end{align}

(7)

for some constants $C_{1} > 0$ and $C_{2}$ that are related to $λ$ and/or $π$ since they are produced by the conditional expectation under the measurement error model (3). Then combining (6) and (7) gives that

\begin{align} E [\frac{1}{C_{1}} \{ℓ (β; 𝒟_{0}^{*}) - \sum_{i = 1}^{n_{0}} C_{2} X_{i}^{(0) ⊤} β - \sum_{i = 1}^{n_{0}} (C_{1} - 1) \\ \exp (X_{i}^{(0) ⊤} β)\}| 𝕐^{(0)}, 𝕏^{(0)}] = ℓ (β; 𝒟_{0}), \end{align}

showing that

\begin{align} ℓ^{*} (β; 𝒟_{0}^{*}) & ≜ \frac{1}{C_{1}} \{ℓ (β; 𝒟_{0}^{*}) - \sum_{i = 1}^{n_{0}} C_{2} X_{i}^{(0) ⊤} β \\ - \sum_{i = 1}^{n_{0}} (C_{1} - 1) \exp (X_{i}^{(0) ⊤} β)\} \end{align}

(8)

It suggests that (8) is the corrected likelihood function and satisfies (4) and (5). We also note that the values $C_{1}$ and $C_{2}$ are implicitly dependent on the choices of $a$ and $b$ , which are discussed later. Moreover, regardless of the values $a$ and $b$ , the constant $C_{1}$ is always nonzero, so that the corrected likelihood function (8) is well defined, and the conditional expectation of the error‐prone response can be recovered to the unobserved one as shown in (7), which is a similar phenomenon in the classical measurement error model (e.g., Carroll et al. [8]).

To the end, we separately examine three different cases $(a, b) = (0, 1), (1, 0)$ and $(1, 1)$ , derive $E (Y_{i}^{* (0)} | 𝕐^{(0)}, 𝕏^{(0)})$ and the function $ℓ^{*}$ accordingly.

Case 1: leave‐out error only ( $a = 0$ and $b = 1$ )

In this case, the surrogate response $Y_{i}^{* (0)}$ in (3) is characterized as

\begin{align} Y_{i}^{* (0)} = Y_{i}^{(0)} - W_{i}^{(0)} . \end{align}

The conditional expectation of the observed response $Y_{i}^{* (0)}$ given the unobserved response and the covariates is given by

\begin{align} E (Y_{i}^{* (0)} | 𝕐^{(0)}, 𝕏^{(0)}) & = E (Y_{i}^{(0)} | 𝕐^{(0)}, 𝕏^{(0)}) - E (W_{i}^{(0)} | 𝕐^{(0)}, 𝕏^{(0)}) \\ = Y_{i}^{(0)} - Y_{i}^{(0)} π \\ = (1 - π) Y_{i}^{(0)}, \end{align}

where $π$ represents the probability defined in Section 2.2. By (7), we have that $C_{1} = 1 - π$ and $C_{2} = 0$ . It suggests that the corrected log‐likelihood function (8) is given by

\begin{align} ℓ^{*} (β; 𝒟_{0}^{*}) = \frac{1}{1 - π} \{ℓ (β; 𝒟_{0}^{*}) + \sum_{i = 1}^{n_{0}} π \exp (X_{i}^{(0) ⊤} β)\} . \end{align}

(9)

Case 2: add‐in error only ( $a = 1$ and $b = 0$ )

For this case, the model (3) reduces to

\begin{align} Y_{i}^{* (0)} = Y_{i}^{(0)} + Z_{i}^{(0)} . \end{align}

Then the conditional expectation of $Y_{i}^{* (0)}$ given the unobserved response and the covariates is given by

\begin{align} E (Y_{i}^{* (0)} | 𝕐^{(0)}, 𝕏^{(0)}) & = E (Y_{i}^{(0)} | 𝕐^{(0)}, 𝕏^{(0)}) + E (Z_{i}^{(0)} | 𝕐^{(0)}, 𝕏^{(0)}) \\ = Y_{i}^{(0)} + λ, \end{align}

yielding that $C_{1} = 1$ and $C_{2} = λ$ in (7) with $λ$ representing the parameter in Section 2.2. Therefore, the corrected log‐likelihood function in Case 2 is

\begin{align} ℓ^{*} (β; 𝒟_{0}^{*}) = ℓ (β; 𝒟_{0}^{*}) - \sum_{i = 1}^{n_{0}} λ X_{i}^{(0) ⊤} β . \end{align}

(10)

Case 3: both add‐in and leave‐out errors ( $a = 1$ and $b = 1$ )

Based on (3), the conditional expectation of $Y_{i}^{* (0)}$ is

\begin{align} E (Y_{i}^{* (0)} | 𝕐^{(0)}, 𝕏^{(0)}) & = E (Y_{i}^{(0)} | 𝕐^{(0)}, 𝕏^{(0)}) + E (Z_{i}^{(0)} | 𝕐^{(0)}, 𝕏^{(0)}) \\ - E (W_{i}^{(0)} | 𝕐^{(0)}, 𝕏^{(0)}) \\ = (1 - π) Y_{i}^{(0)} + λ, \end{align}

(11)

where $C_{1} = 1 - π$ and $C_{2} = λ$ due to (7). It yields that (8) is expressed as

\begin{align} ℓ^{*} (β; 𝒟_{0}^{*}) & = \frac{1}{1 - π} \{ℓ (β; 𝒟_{0}^{*}) - \sum_{i = 1}^{n_{0}} λ X_{i}^{(0) ⊤} β + \sum_{i = 1}^{n_{0}} π \exp (X_{i}^{(0) ⊤} β)\} . \end{align}

(12)

In the following development, we temporarily assume that the parameters $π$ and $λ$ are known and use (8) to derive the estimator of $β$ .

3.2. Transfer Learning With Measurement Error in Responses

3.2.1. Estimation via Transfer Learning

To estimate $β$ by (8) with source data $𝒟_{k}^{*}$ incorporated, we adopt the transfer learning method. We first demonstrate the transfer learning procedure by using all source data. Suppose that variables in $𝒟_{k}^{*}$ share the same structure (1) for all $k = 1, \dots, K$ . We first aggregate the target data and all source data, denoted as $⋃_{k = 0}^{K} 𝒟_{k}^{*}$ . These data are utilized to obtain preliminary overall estimator and perform variable selection by the following optimization

\begin{align} \hat{w} & = \underset{w}{argmin} [\frac{1}{\sum_{k = 0}^{K} n_{k}} \sum_{k = 0}^{K} \frac{1}{C_{1}} {ℓ (w; 𝒟_{k}^{*}) - \sum_{i = 1}^{n_{k}} C_{2} X_{i}^{(k) ⊤} w \\ - \sum_{i = 1}^{n_{k}} (C_{1} - 1) \exp (X_{i}^{(k) ⊤} w)} + ϕ_{w} ‖ w ‖_{1}], \end{align}

(13)

where $w$ is the vector of parameters that links $𝕐^{* (k)} ≜ {(Y_{1}^{* (k)}, Y_{2}^{* (k)}, \dots, Y_{n_{k}}^{* (k)})}^{⊤}$ and $𝕏^{(k)}$ in the source data $𝒟_{k}^{*}$ for all $k$ , $ϕ_{w}$ is the tuning parameter, and $C_{1}$ and $C_{2}$ are defined in (7) reflecting Cases 1–3.

With $\hat{w}$ obtained by (13), let $δ ≜ β - \hat{w}$ denote the contrast between $\hat{ω}$ and $β$ . Following the framework of transfer learning (e.g., Li et al. [3]; Tian and Feng [4]), the next step is to construct the corrected likelihood function for the target data with $β$ replaced by $\hat{w} + δ$ . In this way, it is sufficient to estimate $δ$ , and we then use the resulting estimator of $δ$ to derive the debiased estimator of $β$ . Specifically, we solve the following penalized likelihood function under the target data $𝒟_{0}^{*}$ :

\begin{align} \hat{δ} & = {argmin}_{δ} [\frac{1}{n_{0} C_{1}} \{ℓ ((\hat{w} + δ); 𝒟_{0}^{*}) - \sum_{i = 1}^{n_{0}} C_{2} X_{i}^{(0) ⊤} (\hat{w} + δ) \\ - \sum_{i = 1}^{n_{0}} (C_{1} - 1) \exp (X_{i}^{(0) ⊤} (\hat{w} + δ))\} + ϕ_{δ} ‖ \hat{w} + δ ‖_{1}], \end{align}

(14)

where $ϕ_{δ}$ is the tuning parameter. Consequently, the estimator of $β$ is given by

\begin{align} \hat{β} = \hat{w} + \hat{δ} . \end{align}

(15)

3.2.2. Detection of Source Data

As noted in Section 2.1, there exist $K$ different source data and not all of them have the similar structure to the target data. We need to select source data that are informative to the target data and use them to improve the estimator of $β$ based on the estimation procedure in Section 3.2.1.

Specifically, we first randomly divide the target data $𝒟_{0}^{*}$ into $R$ subsamples with approximated equal sizes, and we denote $𝒟_{0}^{* [r]}$ as the $r th$ subsample for $r = 1, \dots, R$ . For $r = 1, \dots, R$ , let $𝒟_{0}^{* [∖ r]} ≜ ⋃_{j \neq r} 𝒟_{0}^{* [j]}$ denote the subdata of $𝒟_{0}^{*}$ with the $r th$ subsample $𝒟_{0}^{* [r]}$ removed for $r = 1, \dots, R$ . Given the subdata $𝒟_{0}^{* [∖ r]}$ , we solve the following optimization for the parameter $β$ :

\begin{align} {\hat{β}}^{(0) [r]} & = \underset{β}{argmin} [\frac{1}{n (𝒟_{0}^{* [∖ r]})} \sum_{i \in 𝒟_{0}^{* [∖ r]}} \{- Y_{i}^{*} X_{i}^{⊤} β + \exp (X_{i}^{⊤} β)\} \\ + ϕ^{(0) [r]} ‖ β ‖_{1}], \end{align}

where $n (𝒜)$ denotes the number of elements in the set $𝒜$ and $ϕ^{(0) [r]}$ is a tuning parameter. In addition, for $k = 1, \dots, K$ , we use the union of the target subdata and the $k th$ source data, denoted as $𝒟_{k}^{* [∖ r]} = 𝒟_{k}^{*} \cup 𝒟_{0}^{* [∖ r]}$ to run two‐stage transfer learning in Section 3.2.1 and obtain the estimator ${\hat{β}}^{(k) [r]}$ .

Next, we apply the subdata $𝒟_{0}^{* [r]}$ to define the following validated likelihood function

\begin{align} {\hat{L}}_{0}^{[r]} (β) & = \frac{1}{n (𝒟_{0}^{* [r]})} \sum_{i \in 𝒟_{0}^{* [r]}} \{- Y_{i}^{* (0) [r]} X_{i}^{(0) [r] ⊤} β \\ + \exp (X_{i}^{(0) [r] ⊤} β)) + \log (Y_{i}^{* (0) [r]}!)\} . \end{align}

(16)

Based on estimators ${\hat{β}}^{(0) [r]}$ and ${\hat{β}}^{(k) [r]}$ for $r = 1, \dots, R$ and $k = 1, \dots, K$ , we then define ${\hat{L}}_{0}^{(0)} ≜ \frac{1}{R} \sum_{r = 1}^{R} {\hat{L}}_{0}^{[r]} ({\hat{β}}^{(0) [r]})$ and ${\hat{L}}_{0}^{(k)} ≜ \frac{1}{R} \sum_{r = 1}^{R} {\hat{L}}_{0}^{[r]} ({\hat{β}}^{(k) [r]})$ for $k = 1, \dots, K$ . Finally, the set containing informative source data is given by

\begin{align} 𝒜_{I} = \{k \neq 0 : | {\hat{L}}_{0}^{(k)} - {\hat{L}}_{0}^{(0)} | \leq C_{0} σ_{L}^{2}\}, \end{align}

(17)

where $C_{0}$ is a pre‐specified positive constant and $σ_{L}^{2} ≜ \sqrt{\frac{1}{R - 1} \sum_{r = 1}^{R} {\{{\hat{L}}_{0}^{[r]} ({\hat{β}}^{(0) [r]}) - {\hat{L}}_{0}^{(0)}\}}^{2}}$ , which follows a similar formulation in Tian and Feng [4].

We note that ${\hat{L}}_{0}^{(k)}$ and ${\hat{L}}_{0}^{(0)}$ are derived by (2) with error‐prone responses $Y_{i}^{* (k)}$ instead of using the corrected likelihood function (9), (10), or (12), which is due to that the corrected function involves parameters $λ$ and $π$ that are usually unknown. In addition, the goal here is to determine the informative source data instead of estimating the parameter $β$ . To see why the approach works for source data detection, we observe from (11) that

\begin{align} E (Y_{i}^{* (0)} | 𝕏^{(0)}) = (1 - π) E (Y_{i}^{(0)} | 𝕏^{(0)}) + λ . \end{align}

It implies that the expectation of ${\hat{L}}_{0}^{[r]} (β)$ in (16) is given by

\begin{align} E \{{\hat{L}}_{0}^{[r]} (β)\} & = E [E \{{\hat{L}}_{0}^{[r]} (β) | 𝕏^{(0)}\}] \\ = E (E [\frac{1}{n (𝒟_{0}^{* [r]})} \sum_{i \in 𝒟_{0}^{* [r]}} \{- Y_{i}^{* (0) [r]} X_{i}^{(0) [r] ⊤} β \\ + \exp (X_{i}^{(0) [r] ⊤} β)\}| 𝕏^{(0)}]) \\ = (1 - π) E \{L_{0}^{[r]} (β)\} - λ E ({\overline{X}}^{(0) [r] ⊤} β) \\ + π E \{\frac{1}{n (𝒟_{0}^{* [r]})} \sum_{i \in 𝒟_{0}^{* [r]}} \exp (X_{i}^{(0) [r] ⊤} β)\} \\ + E [\frac{1}{n (𝒟_{0}^{* [r]})} \sum_{i \in 𝒟_{0}^{* [r]}} \{\log (Y_{i}^{* (0) [r]}!) - \log (Y_{i}^{(0) [r]}!)\}], \end{align}

(18)

where $L_{0}^{[r]} (β)$ is defined as (16) with $𝒟_{0}^{*}$ and $Y_{i}^{* (0)}$ replaced by $𝒟_{0}$ and $Y_{i}^{(0)}$ . (18) reflects that (16) can be expressed by $L_{0}^{[r]} (β)$ and additional terms in the population perspective. Under the empirical estimate of (18) with $β$ replaced by ${\hat{β}}^{(k) [r]}$ and ${\hat{β}}^{(0) [r]}$ , ${\hat{L}}_{0}^{(k)} - {\hat{L}}_{0}^{(0)}$ can be expressed as

\begin{align} {\hat{L}}_{0}^{(k)} - {\hat{L}}_{0}^{(0)} & = (1 - π) (L_{0}^{(k)} - L_{0}^{(0)}) \\ - λ \frac{1}{R} \sum_{r = 1}^{R} [{\overline{X}}^{(0) [r] ⊤} \{{\hat{β}}^{(k) [r]} - {\hat{β}}^{(0) [r]}\}] \\ + π \frac{1}{R} \sum_{r = 1}^{R} [\frac{1}{n (𝒟_{0}^{* [r]})} \sum_{i \in 𝒟_{0}^{* [r]}} \{\exp (X_{i}^{(0) [r] ⊤} {\hat{β}}^{(k) [r]}) \\ - \exp (X_{i}^{(0) [r] ⊤} {\hat{β}}^{(0) [r]})\}] \\ ≜ (1 - π) (L_{0}^{(k)} - L_{0}^{(0)}) + B, \end{align}

where $B$ represents the values in the second and third term. By the triangle inequality, we have that

\begin{align} |L_{0}^{(k)} - L_{0}^{(0)}| \leq \frac{1}{1 - π} |{\hat{L}}_{0}^{(k)} - {\hat{L}}_{0}^{(0)}| + \frac{1}{1 - π} | B | . \end{align}

(19)

Provided the upper bound $C_{0} σ_{L}^{2}$ in (17), (19) suggests that $|L_{0}^{(k)} - L_{0}^{(0)}| \leq C_{0}^{*} σ_{L}^{2}$ with $C_{0}^{*} = \frac{C_{0}}{1 - π} + \frac{| B |}{(1 - π) σ_{L}^{2}}$ , which is similar to the selection criterion of the source data in Tian and Feng [4]. It implies that the closeness between observed target and source data $𝒟_{0}^{*}$ and $𝒟_{k}^{*}$ indicates the closeness of unobserved ones $𝒟_{0}$ and $𝒟_{k}$ . The other notable remark is that the implementation is based on the observed data without the correction step in Section 3.1, which is due to the fact that the parameters $λ$ and $π$ are unknown and the purpose here is to detect informative source data rather than to estimate the parameter $β$ .

Based on the set (17), we aggregate the informative source datasets with the target dataset, denoted as $⋃_{k \in {0} \cup 𝒜_{I}} 𝒟_{k}^{*}$ , and we estimate $w$ by (13) under the target and informative source data, that is,

\begin{align} {\hat{w}}^{𝒜_{I}} & = \underset{w}{argmin} [\frac{1}{n_{𝒜_{I}} + n_{0}} \sum_{k \in {0} \cup 𝒜_{I}} \frac{1}{C_{1}} {ℓ (w; 𝒟_{k}^{*}) - \sum_{i = 1}^{n_{k}} C_{2} X_{i}^{(k) ⊤} w \\ - \sum_{i = 1}^{n_{k}} (C_{1} - 1) \exp (X_{i}^{(k) ⊤} w)} + ϕ_{w_{2}} ‖ w ‖_{1}], \end{align}

(20)

where $ϕ_{w_{2}}$ is the tuning parameter and $n_{𝒜_{I}}$ represents the total sample size in $⋃_{k \in 𝒜_{I}} 𝒟_{k}^{*}$ .

Similarly, we estimate $δ$ by (14) with $\hat{w}$ replaced by ${\hat{w}}^{𝒜_{I}}$ , that is,

\begin{align} {\hat{δ}}^{𝒜_{I}} & = \underset{δ}{argmin} [\frac{1}{n_{0} C_{1}} \{ℓ (({\hat{w}}^{𝒜_{I}} + δ); 𝒟_{0}^{*}) - \sum_{i = 1}^{n_{0}} C_{2} X_{i}^{(0) ⊤} ({\hat{w}}^{𝒜_{I}} + δ) \\ - \sum_{i = 1}^{n_{0}} (C_{1} - 1) \exp (X_{i}^{(0) ⊤} ({\hat{w}}^{𝒜_{I}} + δ))\} + ϕ_{δ_{2}} ‖ {\hat{w}}^{𝒜_{I}} + δ ‖_{1}], \end{align}

(21)

where $ϕ_{δ_{2}}$ is the tuning parameter. Consequently, the resulting estimator of $β$ is given by ${\hat{ω}}^{𝒜_{I}} + {\hat{δ}}^{𝒜_{I}}$ .

3.3. Computational Implementation via Local Quadratic Approximation

Noting that the derivations of $\hat{β}$ require two penalized estimating functions (20) and (21), we discuss the computational implementation in this section. Unlike other approaches in the existing literature, such as the coordinate descent method (Friedman et al. [20]), or the proximal gradient descent method (Klosa et al. [21]), we primarily employ an iterative method using the Newton‐Raphson algorithm in conjunction with a local quadratic approximation (e.g., Lee et al. [22]) because it helps simplify and accelerate solving the problem via the finite iterations in the recursive formula.

Let $ℓ^{*'} (β; 𝒟_{0}^{*}) ≜ \frac{\partial}{\partial β} ℓ^{*} (β; 𝒟_{0}^{*})$ and $ℓ^{*''} (β; 𝒟_{0}^{*}) ≜ \frac{\partial^{2}}{\partial β \partial β^{⊤}} ℓ^{*} (β; 𝒟_{0}^{*})$ . Here we assume that the matrix $ℓ^{*''} (β; 𝒟_{0}^{*})$ is positive definite so that its inverse exists. By the Karush‐Kuhn‐Tucker (KKT) conditions and the Newton–Raphson algorithm, we obtain the following recursive form to solve (20):

\begin{align} w^{(t + 1)} & = w^{(t)} - {\{ℓ^{*^{''}} (w^{(t)}; 𝒟_{0}^{*} \cup \{⋃_{k \in 𝒜_{ℐ}} 𝒟_{k}^{*}\})\}}^{- 1} \\ \cdot \{ℓ^{*^{'}} (w^{(t)}; 𝒟_{0}^{*} \cup \{⋃_{k \in 𝒜_{ℐ}} 𝒟_{k}^{*}\}) - ϕ_{w_{2}} \cdot sign (w^{(t)})\} \end{align}

(22)

for $t = 0, 1, 2, \dots$ , where $w^{(t)}$ is the value evaluated at the $t th$ iteration, $sign (w^{(t)})$ is a vector with component $sign (w_{i}^{(t)})$ that is in ${- 1, 1}$ if $w_{i}^{(t)} \neq 0$ or is 0 if $w_{i}^{(t)} = 0$ , $w_{i}^{(t)}$ is the $i th$ element of $w^{(t)}$ , and $ℓ^{*} (w; 𝒟_{0}^{*} \cup {⋃_{k \in 𝒜_{ℐ}} 𝒟_{k}^{*}})$ represents the objective function in (20). To implement (22) and obtain the estimator of $w$ with the optimal tuning parameter $ϕ_{w_{2}}$ , we employ the $R$ ‐fold cross‐validation. Specifically, we randomly split the data in $𝒟_{0}^{*} \cup 𝒟_{k}^{*}$ for all $k \in 𝒜_{I}$ into $R$ subdata with approximately equal size. For $r = 1, \dots, R$ , let the $r th$ subdata be the $r th$ validation data $𝒱_{r}$ and let the merge of the remaining subdata denote the $r th$ training data $𝒯_{r}$ . We first run (22) with the initial value given by $w^{(0)} = 0_{p}$ and a given $ϕ_{w_{2}}$ under $𝒯_{r}$ to obtain the estimator ${\hat{w}}_{r} (ϕ_{w_{2}})$ . After that, we compute the predicted error under $𝒱_{r}$ , which is denoted as $P E_{r}$ . Finally, repeating these steps $r$ times gives the cross‐validation value that is a function of $ϕ_{w_{2}}$ :

C V (ϕ_{w_{2}}) ≜ \frac{1}{R} \sum_{r = 1}^{R} P E_{r} .

Then the optimal tuning parameter is given by ${\hat{ϕ}}_{w_{2}} ≜ \underset{ϕ_{w_{2}}}{argmin} C V (ϕ_{w_{2}})$ , and thus, the resulting estimator of $w$ is given by ${\hat{w}}^{𝒜_{I}} ≜ {\hat{w}}^{𝒜_{I}} ({\hat{ϕ}}_{w_{2}})$ . By a similar derivation to (22), we use the following recursive formula to solve (21):

\begin{align} δ^{(t + 1)} & = δ^{(t)} - {\{ℓ^{*^{''}} (({\hat{w}}^{𝒜_{I}} + δ^{(t)}); 𝒟_{0}^{*})\}}^{- 1} \\ \cdot \{ℓ^{*^{'}} (({\hat{w}}^{𝒜_{I}} + δ^{(t)}); 𝒟_{0}^{*}) - ϕ_{δ_{2}} \cdot sign ({\hat{w}}^{𝒜_{I}} + δ^{(t)})\} \end{align}

(23)

for $t = 0, 1, 2, \dots$ , where $δ^{(t)}$ is the value evaluated at the $t th$ iteration. One can employ the same derivations of (22) to derive the estimator of $δ$ with the optimal $ϕ_{δ_{2}}$ under (23).

4. Estimation With Unknown Parameters in Measurement Error Models

In Section 3, we derived the corrected likelihood function and perform the transfer learning algorithm by pretending that parameters $λ$ and $π$ of the noise terms $Z_{i}^{(k)}$ and $W_{i}^{(k)}$ in (3) are known. However, an implicit assumption of knowing parameters $λ$ and $π$ from $Z_{i}^{(k)}$ and $W_{i}^{(k)}$ in (3) is unrealistic in applications. Hence, we relax this implicit assumption and estimate $λ$ and $π$ by integrating informative source data in this section. To the end, we separately examine three cases in Section 2.2.

First, for Case 1, the conditional expectation of (3) with $a = 0$ and $b = 1$ is expressed as

\begin{align} E (Y_{i}^{* (k)} | 𝕏^{(k)}) & = E (Y_{i}^{(k)} | 𝕏^{(k)}) - E (W_{i}^{(k)} | 𝕏^{(k)}) \\ = E (Y_{i}^{(k)} | 𝕏^{(k)}) - E \{E (W_{i}^{(k)} | Y_{i}^{(k)}, 𝕏^{(k)}) | 𝕏^{(k)}\} \\ = E (Y_{i}^{(k)} | 𝕏^{(k)}) - E (π Y_{i}^{(k)} | 𝕏^{(k)}) \\ = (1 - π) E (Y_{i}^{(k)} | 𝕏^{(k)}) \\ = (1 - π) \exp (X_{i}^{(k) ⊤} w) \end{align}

(24)

for all $k \in {0} \cup 𝒜_{I}$ , where the last step is due to the mean function based on (1). Taking the expectation on (24) and the summation with respect to $k \in {0} \cup 𝒜_{I}$ give that

\begin{align} π = 1 - \frac{\sum_{k \in {0} \cup 𝒜_{I}} E (Y_{i}^{* (k)})}{\sum_{k \in {0} \cup 𝒜_{I}} E \{\exp (X_{i}^{(k) ⊤} w)\}} . \end{align}

(25)

We further estimate $E (Y_{i}^{* (k)})$ and $E {\exp (X_{i}^{(k) ⊤} w)}$ by empirical estimates ${\overline{Y}}^{* (k)} = \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} Y_{i}^{* (k)}$ and $\frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} \exp (X_{i}^{(k) ⊤} w)$ , respectively, then (25) is given by

\begin{align} π = 1 - \frac{\sum_{k \in {0} \cup 𝒜_{I}} {\overline{Y}}^{* (k)}}{\sum_{k \in {0} \cup 𝒜_{I}} \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} \exp (X_{i}^{(k) ⊤} w)} . \end{align}

(26)

We now run the iterations based on (26) and (22). First, the initial value $w^{(0)} = 0_{p}$ is given. At the $t th$ iteration, we compute

\begin{align} π^{(t + 1)} = 1 - \frac{\sum_{k \in {0} \cup 𝒜_{I}} {\overline{Y}}^{* (k)}}{\sum_{k \in {0} \cup 𝒜_{I}} \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} \exp (X_{i}^{(k) ⊤} w^{(t)})} \end{align}

(27)

for $t = 1, \dots$ , and then plug (27) into (22) to obtain $w^{(t + 1)}$ . Let $\hat{w}$ denote the convergence value in (22) and let $\hat{π}$ denote the resulting convergence value in (27).

Under Case 2, the conditional expectation of (3) with $a = 1$ and $b = 0$ is given by

\begin{align} E (Y_{i}^{* (k)} | 𝕏^{(k)}) & = E (Y_{i}^{(k)} | 𝕏^{(k)}) + E (Z_{i}^{(k)} | 𝕏^{(k)}) \\ = \exp (X_{i}^{(k) ⊤} w) + λ \end{align}

(28)

for $k \in {0} \cup 𝒜_{I}$ . Taking the expectation on (28) gives that

\begin{align} λ = \frac{1}{n ({0} \cup 𝒜_{I})} \sum_{k \in {0} \cup 𝒜_{I}} [E (Y_{i}^{* (k)}) - E \{\exp (X_{i}^{(k) ⊤} w)\}] . \end{align}

Following the discussion in (27), we obtain the following recursive form to estimate $λ$ :

\begin{align} λ^{(t + 1)} = \frac{1}{n ({0} \cup 𝒜_{I})} \sum_{k \in {0} \cup 𝒜_{I}} \{{\overline{Y}}^{* (k)} - \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} \exp (X_{i}^{(k) ⊤} w^{(t)})\} \end{align}

(29)

for $t = 1, \dots$ . With the initial value given by $w^{(0)} = 0_{p}$ , repeating (29) and (22) iteratively gives the convergence values $\hat{w}$ and $\hat{λ}$ , respectively.

Finally, under Case 3, the conditional expectation of (3) with $a = 1$ and $b = 1$ is given by

\begin{align} E (Y_{i}^{* (k)} | 𝕏^{(k)}) & = E (Y_{i}^{(k)} | 𝕏^{(k)}) + E (Z_{i}^{(k)} | 𝕏^{(k)}) - E (W_{i}^{(k)} | 𝕏^{(k)}) \\ = (1 - π) E (Y_{i}^{(k)} | 𝕏^{(k)}) + λ \\ = (1 - π) \exp (X_{i}^{(k) ⊤} w) + λ \end{align}

(30)

for all $k \in 𝒜_{I}$ . In the presence of two unknown parameters $π$ and $λ$ , we estimate $π$ and $λ$ iteratively. Specifically, we set the initial value $π = 0$ in (30), and apply (29) to obtain the estimator, which is denoted by $\hat{λ}$ . Replacing $λ$ in (30) by $\hat{λ}$ gives that

\begin{align} E (Y_{i}^{* (k)}) = (1 - π) \exp (X_{i}^{(k) ⊤} w) + \hat{λ} . \end{align}

Following the discussion in (27), we obtain the following recursive form:

\begin{align} π^{(t + 1)} = 1 - \frac{\sum_{k \in {0} \cup 𝒜_{I}} ({\overline{Y}}^{* (k)} - \hat{λ})}{\sum_{k \in {0} \cup 𝒜_{I}} \frac{1}{n_{k}} \sum_{i = 1}^{n_{k}} \exp (X_{i}^{(k) ⊤} {\hat{w}}^{(t)})} \end{align}

(31)

for $t = 1, \dots$ and with the initial value $w^{(0)} = 0_{p}$ . Then following the similar discussion of (27) yields the estimator $\hat{π}$ .

Given the estimator of $π$ and/or $λ$ , one may implement them to (21) so that the estimator of $δ$ can be obtained, and thus, the estimator of $β$ can be derived.

5. Model Averaging

Different models may yield varying predictions on the same dataset, and no single model is universally perfect. To address model uncertainty, we employ a model averaging approach, which aims to produce more reliable estimators and accurate prediction by combining multiple models.

Let $p^{'}$ denote the number of nonzero elements in the estimator of $w$ derived in (13) or (20), depending on the usage of informative or all source data. The value $p^{'}$ is determined by the optimal tuning parameter derived by the cross‐validation method in Section 3.3. The $p^{'}$ variables produce $S = 2^{p'} - 1$ combinations of variables with the null set removed, yielding $S$ candidate models in the regression model.

Let $β_{m}$ denote the parameter under the $m th$ candidate model for $m = 1, \dots, S$ . Moreover, according to the discussion in Section 3.2.1, the contrast between $β_{m}$ and ${\hat{ω}}^{𝒜_{I}}$ is denoted by $δ_{m} ≜ β_{m} - {\hat{ω}}^{𝒜_{I}}$ , and the resulting estimator, denoted as ${\hat{δ}}_{m}^{𝒜_{I}}$ , is derived by (21) under the target data and the $m th$ candidate model for $m = 1, \dots, S$ . It follows from (15) that ${\hat{β}}_{m}^{𝒜_{I}} ≜ {\hat{ω}}^{𝒜_{I}} + {\hat{δ}}_{m}^{𝒜_{I}}$ is the estimator of $β_{m}$ for $m = 1, \dots, S$ . Then the model averaging estimator is denoted as

\begin{align} {\hat{β}}_{ξ} ≜ \sum_{m = 1}^{S} ξ_{m} {\hat{β}}_{m}^{𝒜_{I}} = {\hat{w}}^{𝒜_{I}} + \sum_{m = 1}^{S} ξ_{m} {\hat{δ}}_{m}^{𝒜_{I}}, \end{align}

(32)

where $ξ ≜ {(ξ_{1}, \dots, ξ_{S})}^{⊤}$ is a vector of weights in a set $𝒲 = {ξ \in {[0, 1]}^{S} : \sum_{m = 1}^{S} ξ_{m} = 1}$ . (32) basically says that the model averaging handles the model uncertainty for the target data and applies in the debiasing term. To determine $ξ$ , we consider two approaches. The first method is the BIC‐based strategy (e.g., Chen and Yi [17]; Zou et al. [16]). For the $m th$ model, the BIC value is formulated by

\begin{align} B I C_{m} = - 2 \cdot \log (L_{m}) + p_{m} \cdot \log (n_{0}) \end{align}

for $m = 1, \dots, S,$ where $\log (L_{m})$ is the log likelihood function of $ℓ^{*}$ evaluated at (8) with covariates in the $m th$ candidate model, and $p_{m}$ is the number of covariates in the $m th$ model. Then the resulting weight $ξ$ is given by

\begin{align} {\hat{ξ}}_{bic} & = {\{\sum_{m = 1}^{S} \frac{- \exp (B I C_{m})}{2}\}}^{- 1} \\ \cdot {(\frac{- \exp (B I C_{1})}{2}, \frac{- \exp (B I C_{2})}{2}, \dots, \frac{- \exp (B I C_{S})}{2})}^{⊤} . \end{align}

(33)

Instead of specifying the weight $ξ$ , the second method is to find the optimal weight from the objective function (e.g., Zou et al. [16]). Specifically, let the negative log‐likelihood function be $Ψ (ξ) ≜ - ℓ^{*} ({\hat{β}}_{ξ}; 𝒟_{0}^{*})$ . The optimal weight is then determined by

\begin{align} {\hat{ξ}}_{opt} = \underset{ξ \in 𝒲}{argmin} Ψ (ξ) . \end{align}

(34)

Note that the optimization of (34) is subject to a constraint $\sum_{m = 1}^{S} ξ_{m} = 1$ . To solve this constrained minimization problem, we adopt an adaptive Barrier algorithm (e.g., Boyd and Vandenberghe [23], Section 11.3; Lange [24], Section 16.3) to solve the minimization, which can be implemented by the R function constrOptim.

Consequently, the model averaging estimator is given by (32) with $ξ$ replaced by (33) or (34).

6. Numerical Studies

6.1. Simulation Setup

In this section, we conduct simulation studies to assess the performance of the proposed method. Let $p = 500, 1000$ denote the dimension of covariates. Let $n_{0} = 200, 400$ be sample sizes of target data, and specify $n_{k} = 100, 200$ as sample sizes of the $k th$ source data for $k = 1, \dots, K$ . We consider the following two scenarios to generate the unobserved target data $𝒟_{0} = {(Y_{i}^{(0)}, X_{i}^{(0)}) : i = 1, \dots, n_{0}}$ and unobserved source data $𝒟_{k} = {(Y_{i}^{(k)}, X_{i}^{(k)}) : i = 1, \dots, n_{k}}$ for $k = 1, \dots, K$ . In our study, we specify $K = 15$ .

Scenario I

Let $s = 5$ denote the number of elements that are nonzero. Let $β = {(0.5 \cdot 1_{s}^{⊤}, 0_{p - s}^{⊤})}^{⊤}$ and $β_{2} = {(0.7 \cdot 1_{s}^{⊤}, 0_{p - s}^{⊤})}^{⊤}$ denote the vectors of true value of parameters, where $1_{s}$ is the $s$ ‐dimensional vector with all elements being 1 and $0_{p - s}$ is the $(p - s)$ ‐dimensional vector with all elements being 0. For the covariates in the target data, we generate random samples $X_{i}^{(0)}$ from the normal distribution with mean $0_{p}$ and covariance matrix $\sum = {[\sum_{j j^{'}}]}_{p \times p}$ with $\sum_{j j'} = 0 . 5^{| j - j' |}$ for $j, j^{'} = 1, \dots, p$ , and $i = 1, \dots, n_{0}$ . In addition, we generate the covariates in source data $X_{i}^{(k)}$ from the normal distribution with mean $0_{p}$ and covariance matrix as $\sum + 0 . 3^{2} I_{p}$ for $i = 1, \dots, n_{k}$ and $k = 1, \dots, K$ , where $I_{p}$ is the $p \times p$ identity matrix. We then generate the response for the target and source data by the following models:

$\begin{align} Y_{i}^{(0)} & \sim Poisson (μ_{i} = \exp (X_{i}^{(0) ⊤} β)), for i = 1, \dots, n_{0}, \\ Y_{i}^{(k)} & \sim Poisson (μ_{i} = \exp (X_{i}^{(k) ⊤} β)), for i = 1, \dots, n_{k}, k = 1, \dots, 10, \\ Y_{i}^{(k)} & \sim Poisson (μ_{i} = \exp (X_{i}^{(k) ⊤} β_{2})), for i = 1, \dots, n_{k}, k = 11, \dots, K . \end{align}$

Given $Y_{i}^{(k)}$ , we further apply (3) with three cases in Section 3.1 to generate $Y_{i}^{* (k)}$ , where $Z_{i}^{(k)}$ follows Poisson distribution with mean $λ = 2, 4, 6$ , and the subtracted term $W_{i}^{(k)}$ follows binomial distribution with size $Y_{i}^{(k)}$ and probability $π = 0.3, 0.5, 0.7$ for $i = 1, \dots, n_{k}$ , and $k = 0, \dots, K$ . In this scenario, $π$ and $λ$ are known.

Scenario II

In this scenario, we treat measurement error parameters as unknown values and implement the method in Section 4 to estimate $λ$ , $π$ and $β$ . All settings follow Scenario I except that $X_{i}^{(k)}$ is generated from the uniform distribution over the interval $[- 1, 1]$ for $i = 1, \dots, n_{k}$ and $k = 0, \dots, K$ .

Our simulation studies are designed to achieve three primary goals. First, with the availability of error‐prone source data, our aim is to accurately identify informative source data. Second, we correct for measurement errors effects and obtain precise estimators under the Poisson regression models. Third, we examine the prediction error to see the accuracy of prediction.

6.2. Simulation Results

Given the synthetic dataset in Section 6.1, we implement the transfer learning and model averaging methods in Sections 3 and 5 with (33) and (34) to derive the estimator, which are denoted by “TL,” “BIC,” and “optimal,” respectively. To compare the performance with the proposed method, we further examine the naive method, which uses the error‐prone response in the estimation procedure. To see the impact of source data, we examine the proposed and naive methods by using all source data.

To assess the performance of estimator of $β$ , we record the $L_{2}$ ‐norm for the bias of $\hat{β}$ :

\begin{align} L_{2} = ‖ \hat{β} - β ‖_{2}^{2}, \end{align}

where $‖ v ‖_{2}^{2} = v_{1}^{2} + \dots + v_{p}^{2}$ for some vector $v = {(v_{1}, \dots, v_{p})}^{⊤}$ . In addition, to assess the prediction performance for new observations, we record the mean squared prediction error (MSPE) and the sample version of the relative KL divergence (SRKL). Specifically, we randomly split the target data into an $80 %$ training set $𝒯$ and a $20 %$ validation set $𝒱 ≜ \{{X_{n e w, i}, Y_{n e w, i}^{*}} : i \in 𝒱\}$ . We first apply the proposed method in Sections 3 and 5 to obtain $\hat{β}$ based on $𝒯$ , then compute the MSPE and SRKL under the data $𝒱$ , which are respectively defined as

\begin{align} MSPE = \frac{1}{| 𝒱 |} \sum_{i \in 𝒱} {(Y_{n e w, i}^{* *} - {\hat{μ}}_{i})}^{2} \end{align}

and

\begin{align} SRKL & = - \frac{1}{| 𝒱 |} \sum_{i \in 𝒱} [Y_{n e w, i}^{* *} X_{n e w, i}^{⊤} \hat{β} - {\hat{μ}}_{i} - \log {(Y_{n e w, i}^{* *} 𝕀 (Y_{n e w, i}^{* *} \geq 0))!}], \end{align}

where $𝕀 (\cdot)$ denotes the indicator function, $Y_{n e w, i}^{* *} = \frac{Y_{n e w, i}^{*} - λ}{1 - π}$ refers to “corrected” response with two parameters $λ$ and $π$ being either known or estimated by Section 4, and ${\hat{μ}}_{i} = \exp (X_{n e w, i}^{⊤} \hat{β})$ is the fitted mean function of the Poisson distribution. Smaller values of MSPE and SRKL divergence indicate better prediction. Due to the limited space, numerical results for one of settings under Scenario I is presented in Table 1; the results for the remaining settings are placed in Tables A.1 to A.23 in the Supporting Information. For the estimator of $β$ , it is obvious to see that the proposed method has a relatively small bias regardless of different sample sizes, dimension $p$ , and magnitudes of measurement error under Cases 1–3 in Section 2.2, indicating that the proposed method is valid for handling measurement error and detect informative source data. Moreover, when $λ$ and/or $π$ are unknown and are estimated by the source data, numerical results in the Supporting Information show that the estimators of $β$ still have the satisfactory performance, which implies that $λ$ and $π$ are also precisely estimated. In contrast, we observe that the biases would be produced when measurement error effects were ignored and/or noninformative source data are falsely included. On the other hand, we also find that the model averaging estimators derived by either the BIC or optimal weights are comparable to or better than the transfer learning estimators with relatively smaller biases if measurement error is corrected and informative source data are incorporated. It verifies from results in Table 1 and the Supporting Information that the proposed method is valid for handling the estimation of $β$ regardless of various combinations of $a$ and $b$ in measurement error models (3) and the corresponding parameters $λ$ and $π$ are given or unknown.

TABLE 1.

Simulation results of Case 1 with known measurement error $(π = 0.3)$ in response.

Optimal

BIC

p

n_{0}

n_{k}

Informative

Correct

L_{2}

‐norm

MSPE

SRKL

L_{2}

‐norm

MSPE

SRKL

L_{2}

‐norm

MSPE

SRKL

500

200

100

Yes

0.001

3.822

1.374

0.001

3.927

1.375

0.001

3.822

1.374

0.016

25.621

1.512

0.015

25.182

1.507

0.016

25.621

1.512

Yes

0.001

3.775

1.375

0.051

63.030

1.504

0.001

3.775

1.375

0.018

27.751

1.526

0.065

13.875

1.529

0.018

27.751

1.526

200

Yes

0.001

3.916

1.325

0.000

3.936

1.325

0.001

3.916

1.325

0.016

23.060

1.471

0.015

22.627

1.467

0.016

23.060

1.471

Yes

0.001

3.970

1.325

0.582

19.362

1.433

0.001

3.972

1.325

0.015

22.309

1.465

0.726

13.527

1.448

0.015

22.309

1.465

400

100

Yes

0.007

4.416

1.392

0.001

4.393

1.392

0.007

4.416

1.392

0.015

41.961

1.589

0.015

41.436

1.586

0.015

41.961

1.589

Yes

0.001

4.441

1.392

0.052

51.114

1.518

0.001

4.441

1.392

0.015

40.907

1.585

0.067

19.753

1.579

0.015

40.907

1.585

200

Yes

0.000

4.359

1.360

0.000

4.359

1.360

0.000

4.359

1.360

0.014

31.639

1.515

0.014

30.639

1.515

0.014

30.639

1.515

Yes

0.001

4.480

1.360

0.081

44.944

1.583

0.001

4.480

1.360

0.014

31.093

1.518

0.095

18.954

1.538

0.014

31.093

1.518

1000

200

100

Yes

0.001

4.196

1.408

0.000

4.180

1.408

0.001

4.196

1.408

0.016

21.220

1.565

0.016

21.093

1.563

0.016

21.220

1.565

Yes

0.001

4.133

1.409

0.037

9.838

1.485

0.001

4.133

1.409

0.015

20.688

1.557

0.051

18.214

1.537

0.015

20.688

1.557

200

Yes

0.005

4.564

1.368

0.005

4.835

1.368

0.005

4.565

1.368

0.020

33.059

1.525

0.019

31.796

1.517

0.020

33.059

1.525

Yes

0.005

5.090

1.368

0.068

30.418

1.520

0.005

5.092

1.368

0.019

32.151

1.520

0.082

18.063

1.513

0.019

32.151

1.520

400

100

Yes

0.001

3.535

1.378

0.000

3.539

1.378

0.001

3.535

1.378

0.018

26.492

1.533

0.017

26.168

1.530

0.018

26.492

1.533

Yes

0.001

3.547

1.378

0.070

19.183

1.521

0.001

3.548

1.378

0.018

26.520

1.534

0.086

17.633

1.535

0.018

26.520

1.534

200

Yes

0.001

3.557

1.375

0.001

3.557

1.375

0.001

3.557

1.375

0.019

43.328

1.558

0.019

43.328

1.558

0.019

43.328

1.558

Yes

0.001

3.876

1.377

0.076

21.707

1.518

0.001

3.872

1.377

0.021

45.343

1.570

0.094

27.105

1.543

0.021

45.343

1.570

Open in a new tab

Note: In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Correct column, “Yes” is the correction of error‐prone responses and “No” represents the naive method without measurement error correction. “BIC,” “optimal,” and “TL” are (33), (34), and transfer learning methods, respectively.

Finally, to see the performance of prediction, we observe from Table 1 and the Supporting Information that the transfer learning method and model averaging method under either the BIC or optimal weights produce precise prediction with relatively small values of MSPE and SRKL under various values of $λ$ and $π$ . In contrast, it is unsurprising to see that the performance of prediction looks worse if measurement error and/or informative source data selection are ignored in the estimation procedure, even if prediction is derived by the model averaging method. In summary, numerical results verify the importance of dealing with measurement error effects and show that the proposed method is valid for handling such a complex situation.

7. Real Data Analysis

7.1. Data Description

We apply the proposed methods to analyze the breast cancer data collected by METABRIC. The dataset is available by using the command cBioDataPack(brca_mbcproject_2022) in the R package cBioPortal. This target dataset contains 1980 subjects and 12,132 common standardized mRNA variables, which serve as the intermediary between DNA and proteins, carry the genetic code copied from DNA during transcription and provide the template for protein synthesis during translation. In addition, this dataset also records nine different genes, including ERBB2, CCND1, MYC, FGFR1, PIK3CA, TP53, BRCA1, BRCA2, and CDH1. Each gene records CNA with possible outcomes ${- 2, - 1, 0, 1, 2}$ : positive (negative) values reflect increase (decrease) of copy number for a region or a gene, yielding the higher risk in the breast cancer (e.g., Baslan et al. [25]). Among nine genes, we record the number of nonzero CNAs for each subject, and treat it our primarily interested variable, denoted as $Y_{i}^{(0)}$ (e.g., Turner et al. [26]; Olivier et al. [27]). To characterize $Y_{i}^{(0)}$ , we use mRNA as the covariates $X_{i}^{(0)}$ , and then employ the Poisson regression model for $Y_{i}^{(0)}$ and $X_{i}^{(0)}$ .

In addition to the target data, we also collect seven different source data, which contain the same variables as those in the target data, from various references, including Metastatistic Breast Cancer Project (MBC project), Clinical Proteomic Tumor Analysis Consortium (CPTAC) and The Cancer Genome Atlas (TCGA) for Cancer Genomics website. The detailed descriptions of the source data are summarized in Table 2. Let $Y_{i}^{(k)}$ and $X_{i}^{(k)}$ denote the corresponding response and covariates in the $k th$ source data for $i = 1, \dots, n_{k}$ and $k = 1, \dots, 7$ .

TABLE 2.

A list of seven source data with the corresponding sample sizes $n_{k}$ for real data analysis.

k

Name

n_{k}

The Metastatic Breast Cancer Project (Provisional, December 2021)

156

The Metastatic Breast Cancer Project (Archived, 2020)

146

Proteogenomic landscape of breast cancer (CPTAC, Cell 2020)

122

Breast Invasive Carcinoma (TCGA, Cell 2015)

421

Breast Invasive Carcinoma (TCGA, Nature 2012)

499

Breast Invasive Carcinoma (TCGA, PanCancer Atlas)

1068

Breast Invasive Carcinoma (TCGA, Firehose Legacy)

513

Open in a new tab

However, in applications, the values of CNA in both target and source data may be susceptible to measurement errors. Specifically, as discussed in the past literature (e.g., Zhang and Zhang [5]; Chiang et al. [6]; Grasso et al. [7]), CNA values are possibly subject to measurement error due to various situations. Thus, we denote $Y_{i}^{* (k)}$ as the observed response, which is taken as the surrogate for $Y_{i}^{(k)}$ for $k = 0, 1, \dots, 7$ . In our study, we consider the general formulation $a = b = 1$ in the model (3) to characterize the error‐prone responses.

Moreover, among the amount of mRNA and various source data in Table 2, it is expected that a part of mRNAs and source data are informative to the CNAs. Naively using all mRNAs and source data or ignoring measurement error effects may affect the accuracy of parameter estimation and prediction. To address those issues and provide reliable analysis, we employ the proposed method to tackle those challenges.

7.2. Analysis Results

The selection criterion in Section 3.2.2 gives $𝒜_{I} = {1, 2, 5, 6}$ , where the numbers in $𝒜_{I}$ are labels associated with the source data in Table 2. With the informative source datasets accommodated, we follow the strategy in Section 6.2 to split the data into the training set and the validation set, and then apply the estimation method in Section 4 to derive the estimators of $π$ and $λ$ in the measurement error model (3) under the training data. We place the resulting estimators in Table 3. Given two estimators of $π$ and $λ$ , we use the transfer learning method in Section 3 to derive the estimator of $β$ under the training data. To see the impact of measurement error and/or the inclusion of noninformative source data, we examine the other competitive methods mentioned in Section 6.2. The numerical results are summarized in Table 4. Finally, we use the estimators in Table 4 and covariates in the validation data to compute the predicted mean function, and then compute the MSPE and SRKL. In addition, we examine the model averaging method in Section 5 to eliminate the uncertainty of model selection. Numerical results are placed in Table 5.

TABLE 3.

Real data analysis for the estimation of $λ$ and $π$ .

Informative

Method

\hat{π}

\hat{λ}

Yes

0.48

0.40

0.24

0.31

Open in a new tab

Note: In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Method column, “MA” and “TL” are the model averaging and transfer learning methods, respectively. $\hat{λ}$ and $\hat{π}$ in the TL row are obtained by the method in Section 4; $\hat{λ}$ and $\hat{π}$ in the MA row are obtained by the method in Section 4 with arithmetic mean.

TABLE 4.

Real data analysis for the estimation of $β$ and mRNA selection with $λ$ and $π$ estimated in Table 3.

Informative	Correct	Method	`CINP`	`OSBPL2`	`THRB`	`SLC5A11`	`ACOT12`	`CRABP1`	`SDS`
Yes	Yes	Optimal	0.254	0.175	−0.242	0.127		−0.107
		BIC	0.240	0.164	−0.233	0.117
		TL	0.254	0.175	−0.242	0.127		−0.107
	No	Optimal	0.272	0.175	−0.123	0.313
		BIC	0.269	0.180	−0.120	0.318
		TL	0.272	0.175	−0.123	0.313
No	Yes	Optimal					−0.167		−0.105
		BIC					−0.417
		TL					−0.433	−0.153
	No	Optimal					−0.160
		BIC					−0.168
		TL					−0.160

Open in a new tab

Note: In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Correct column, “Yes” is the correction of error‐prone responses and “No” represents the naive method without measurement error correction. In the Method column, “BIC,” “optimal,” and “TL” are (33), (34), and transfer learning methods, respectively. Remaining columns are mRNAs, where real values are the corresponding estimators and blanks represent unselected mRNAs.

TABLE 5.

Real data analysis prediction results with $λ$ and $π$ estimated in Table 3.

		Optimal		BIC		TL
Informative	Correct	MSPE	SRKL	MSPE	SRKL	MSPE	SRKL
Yes	Yes	3.803	1.455	3.861	1.612	3.804	1.784
Yes	No	4.016	2.462	4.023	2.463	4.016	2.462
No	Yes	4.627	1.680	4.545	1.793	4.656	1.843
No	No	4.644	2.463	4.639	2.465	4.644	2.463

Open in a new tab

In Table 3, we observe that the estimator of $λ$ is always equal to 3 regardless of the usage of informative source data, while the estimator of $π$ is larger when informative source data are implemented. The possible reason making this difference is due to that noninformative source data are falsely included in the iterations (29) and (31). With the estimators $\hat{λ}$ and $\hat{π}$ equipped, we observe from Table 4 that the corrected TL method selects five mRNAs, including CINP, OSBPL2, THRB, SLC5A11, and CRABP1. While these gene expressions were discussed in various research topics separately (e.g., Lovejoy et al. [28]; Wang et al. [29]; Davidson et al. [30]; Tsai et al. [31]; Liu et al. [32]), the unique finding in our study is that these gene expressions are simultaneously selected when using informative source data. In contrast, when all source data are used in the estimation procedure, the resulting selected variables are entirely different, except for CRABP1. In addition, without the correction of measurement error, we can see that the set of mRNAs selected by the naive method is the subset of that determined by the correction approach. From the estimates in Table 4, we find that the TL and MA methods produce similar values. Finally, Table 5 shows that, when measurement error correction and informative source data are implemented, MSPE and SRKL achieve the smallest values compared to other settings. This result not only reflects that the selected gene expressions (CINP, OSBPL2, THRB, SLC5A11, and CRABP1) are key factors in affecting CNA values for breast cancer, but also verifies the impact of measurement error and the importance of using informative source data.

In addition to estimating the parameters $λ$ and $π$ , we provide an alternative approach, called sensitivity analyses, to examine the impact of various magnitudes of measurement error effects. Specifically, similar to the settings in simulation studies, we specify the parameters $(λ, π)$ by $(2, 0.3), (4, 0.5)$ or $(6, 0.7)$ , which reflect minor, moderate, and severe measurement error effects, respectively. Given $λ$ and $π$ , we run the estimation procedures in Sections 3 and 5 to derive the transfer learning and model averaging estimators, respectively. Numerical results of variable selection and estimation are placed in Table 6. When measurement error is corrected and informative source data are included, Table 6 shows that the selected mRNAs are the same as those determined in Table 4 regardless of the changes of $λ$ and $π$ . When all source data are used, the mRNA ACOT12 is commonly selected, but it is interesting to see that additional mRNAs CRABP1 and SDS are selected when $π$ and $λ$ increase, which may indicate that mRNAs CRABP1 and SDS can be selected when using all source data with large values of $π$ and $λ$ . On the other hand, regardless of the usage of all source data or informative ones, the absolute values of estimators from sensitivity analyses are larger than those obtained using the estimated $\hat{λ}$ and $\hat{π}$ , except for the estimator of ACOT12 under the minor level measurement error effect $(λ, π) = (2, 0.3)$ . This phenomenon reflects that the magnitudes of measurement error $λ$ and $π$ may affect the estimator when the correction is taken into account.

TABLE 6.

Real data analysis for the estimation of $β$ and mRNA selection by specifying values $λ$ and $π$ .

Sensitivity	Informative	Methods	`CINP`	`OSBPL2`	`THRB`	`SLC5A11`	`ACOT12`	`CRABP1`	`SDS`
Minor	Yes	Optimal	0.367	0.242	−0.297	0.314
		BIC	0.367	0.241	−0.302	0.319
		TL	0.367	0.242	−0.297	0.314
	No	Optimal					−0.158
		BIC					−0.374
		TL					−0.361
Moderate	Yes	Optimal	0.463	0.301	−0.544	0.251		−0.237
		BIC	0.471	0.290	−0.566	0.262
		TL	0.463	0.301	−0.544	0.251		−0.237
	No	Optimal					−0.182		−0.155
		BIC					−0.636
		TL					−0.611	−0.261
Severe	Yes	Optimal	0.589	0.352	−0.886	0.126		−0.444
		BIC	0.612	0.326	−0.959	0.146
		TL	0.589	0.352	−0.886	0.126		−0.444
	No	Optimal					−0.312	−0.265	−0.182
		BIC					−0.974
		TL					−0.916	−0.504

Open in a new tab

Note: In the Sensitivity column, “Minor” represents $(λ, π) = (2, 0.3)$ , “Moderate” is $(λ, π) = (4, 0.5)$ , and “Severe” is $(λ, π) = (6, 0.7)$ . In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Method column, “BIC”, “optimal”, and “TL” are (33), (34), and transfer learning methods, respectively. Remaining columns are mRNAs, where real values are the corresponding estimators and blanks represent unselected mRNAs.

Finally, we compute MSPE and SRKL based on three different levels of measurement error and summarize the numerical results in Table 7. We first observe that the MSPE and SRKL values derived by the informative source data are generally smaller than those under all source data, which is similar to the findings in Table 5. When the magnitudes of measurement error increase, the MSPE and SRKL values become large. In particular, when $(λ, π)$ is specified as $(6, 0.7)$ , it is surprising to see that the MSPE and SRKL values are larger than the values without measurement error correction summarized in Table 5. This phenomenon indicates that the count response in this dataset may be simply subject to minor measurement error effect, such as values in Table 3 or a pair $(λ, π) = (2, 0.3)$ , whose corresponding correction produces relatively precise prediction. In contrast, falsely specifying larger values of $λ$ and $π$ can result in an unexpected result.

TABLE 7.

Real data analysis for the estimation of $β$ and mRNA selection by specifying values $λ$ and $π$ .

		Optimal		BIC		TL
Sensitivity	Informative	MSPE	SRKL	MSPE	SRKL	MSPE	SRKL
Minor	Yes	3.758	1.996	3.760	1.998	3.758	1.996
Minor	No	4.546	2.038	4.544	2.048	4.645	2.038
Moderate	Yes	4.117	1.428	4.670	1.856	5.117	2.028
Moderate	No	4.610	1.515	4.943	1.964	5.856	2.170
Severe	Yes	11.927	1.716	9.021	2.959	12.912	3.716
Severe	No	14.712	1.893	11.338	2.975	16.395	3.986

Open in a new tab

Note: In the Sensitivity column, “Minor” represents $(λ, π) = (2, 0.3)$ , “Moderate” is $(λ, π) = (4, 0.5)$ , and “Severe” is $(λ, π) = (6, 0.7)$ . In the Informative column, “Yes” is the usage of informative source data and “No” represents the usage of all source data. In the Correct column, “Yes” is the correction of error‐prone responses and “No” represents the naive method without measurement error correction. “BIC”, “optimal”, and “TL” are (33), (34), and transfer learning methods, respectively.

8. Summary

In this article, we discuss the transfer learning methods for modeling count responses, which is motivated by the breast cancer data with the availability of gene expression variables and various source information. The main challenge of the dataset is the measurement error in the response and high‐dimensional covariates, which may affect the prediction and estimation performance. To tackle these challenges, we introduce measurement error models to characterize error‐prone count response and propose the insertion method to correct for measurement error effects. Our correction strategy in Section 3.1 simply adopts the first moment of the noise terms $Z_{i}^{(k)}$ and $W_{i}^{(k)}$ ; instead, the specific distributions are used to satisfy the implicit restriction of the error‐prone response in (3). With the availability of source data, we develop an estimation procedure to estimate parameters in measurement error models, which is different from the conventional implementation of repeated measurements or validation data. To detect informative source data, we follow Tian and Feng [4] to construct the selection criterion based on error‐in‐response data. As suggested by a referee, the next research direction is to further explore a more accurate upper bound in (17) by analyzing ${\hat{L}}_{0}^{(k)}$ and ${\hat{L}}_{0}^{(0)}$ with measurement error correction taken into account, so that the detection of informative source data can be improved. In addition to the transfer learning method, we further develop the model averaging procedure to improve the accuracy of prediction and avoid the model uncertainty. Simulation studies and analysis of breast cancer data reveal satisfactory performance of the proposed method and verify the importance of correcting for measurement error. As explored by Tian and Feng [4], the transfer learning strategy is valid for deriving the consistent estimator for the GLM. Being a special case of GLM, we may expect that the proposed estimator for the Poisson regression model has the consistency property, especially when the measurement error effects are corrected through the likelihood function. In addition to numerical justification, it is worth exploring theoretical properties in the near future.

There are some possible extensions. For example, a general and complex situation is the mismeasurement of both response and covariates. While the correction of error‐prone covariates has been widely discussed, the impact on the source data detection is unexplored. In addition to the insertion method to correct for measurement error, there are alternative approaches, such as the regression calibration or simulation and extrapolation methods (e.g., Chen [13]), but these methods primarily focus on error‐prone continuous covariates and rare discussion is for the response or discrete variables. It is worth extending other measurement error strategies to handle error‐prone responses and/or covariates. Finally, we comment that the current development lies in the measurement error model (3) and some specific distributions with implicit restrictions are considered for the error terms in the measurement error model. In applications, the measurement error model and the corresponding distributions of the noise terms are usually unknown. To address these issues and examine the validity of the model and distribution, additional auxiliary information with the unobserved response $Y_{i}^{(k)}$ available, such as external or internal validation data (e.g., Carroll et al. [8]), is required since measurement error models involve the unobserved response $Y_{i}^{(k)}$ . Moreover, with the availability of validation data, one may further examine the homogeneity or heterogeneity of the parameters in (3), that is, $H_{0} : λ^{(k)} = λ^{(k^{'})}$ and $H_{0} : π^{(k)} = π^{(k^{'})}$ for $k, k^{'} = 0, 1, \dots, K$ with $k \neq k^{'}$ , and then estimate each heterogeneous parameter $λ^{(k)}$ and $π^{(k)}$ by the validation data if $H_{0}$ is rejected. While these issues are not main concerns in the current development, they deserve further exploration in our future work.

Conflicts of Interest

The authors declare no conflicts of interest.

Supporting information

Data S1. Supporting Information.

SIM-44-0-s001.pdf^{(181KB, pdf)}

Acknowledgments

The authors thank the Editor, an Associate Editor, and three referees for their useful comments that significantly improved the initial manuscript. Chen's research was supported by National Science and Technology Council.

Wu J.-C. and Chen L.-P., “Transfer Learning for Error‐Contaminated Poisson Regression Models,” Statistics in Medicine 44, no. 15‐17 (2025): e70163, 10.1002/sim.70163.

Funding: This work was supported by the National Science and Technology Council (Grant No. 112‐2118‐M‐004‐005‐MY2).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

1. Bastani H., “Predicting With Proxies: Transfer Learning in High Dimension,” Management Science 67 (2021): 2964–2984. [Google Scholar]
2. Chen A., Owen A. B., and Shi M., “Data Enriched Linear Regression,” Electronic Journal of Statistics 9 (2015): 1078–1112. [Google Scholar]
3. Li S., Cai T. T., and Li H., “Transfer Learning for High‐Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 84 (2022): 149–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Tian Y. and Feng Y., “Transfer Learning Under High‐Dimensional Generalized Linear Models,” Journal of the American Statistical Association 118 (2023): 2684–2697. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Zhang L. and Zhang L., “Use of Autocorrelation Scanning in DNA Copy Number Analysis,” Bioinformatics 29 (2013): 2678–2682. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Chiang D. Y., Getz G., Jaffe D. B., et al., “High‐Resolution Mapping of Copy‐Number Alterations With Massively Parallel Sequencing,” Nature Methods 6 (2009): 99–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Grasso C., Butler T., Rhodes K., et al., “Assessing Copy Number Alterations in Targeted, Amplicon‐Based Next‐Generation Sequencing Data,” Journal of Molecular Diagnostics 17 (2015): 53–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Carroll R. J., Ruppert D., Stefanski L. A., and Crainiceanu C. M., Measurement Error in Nonlinear Model (CRC Press Chapman and Hall, 2006). [Google Scholar]
9. Kukush A., Schneeweis H., and Wolf R., “Three Estimators for the Poisson Regression Model With Measurement Errors,” Statistical Papers 45 (2004): 351–368. [Google Scholar]
10. Sørensen Ø., Hellton K. H., Frigessi A., and Thoresen M., “Covariate Selection in High‐Dimensional Generalized Linear Models With Measurement Error,” Journal of Computational and Graphical Statistics 27 (2018): 739–749. [Google Scholar]
11. Yang H., Pan F., Tong D., Brown H. E., and Liu J., “Measurement Error‐Tolerant Poisson Regression for Valley Fever Incidence Prediction,” IISE Transactions on Healthcare Systems Engineering 14 (2024): 305–317, 10.1080/24725579.2024.2370243. [DOI] [Google Scholar]
12. Jiang F., Zhou Y., Liu J., and Ma Y., “On High‐Dimensional Poisson Models With Measurement Error: Hypothesis Testing for Nonlinear Nonconvex Optimization,” Annals of Statistics 51 (2023): 233–259. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Chen L.‐P., “De‐Noising Boosting Methods for Variable Selection and Estimation Subject to Error‐Prone Variables,” Statistics and Computing 33 (2023): 38. [Google Scholar]
14. Chen L.‐P. and Qiu B., “SIMEXBoost: An R Package for Analysis of High‐Dimensional Error‐Prone Data Based on Boosting Method,” R Journal 15 (2023): 5–20. [Google Scholar]
15. Zhang Q. and Yi G. Y., “Zero‐Inflated Poisson Models With Measurement Error in the Response,” Biometrics 79 (2023): 1089–1102. [DOI] [PubMed] [Google Scholar]
16. Zou J., Wang W., Zhang X., and Zou G., “Optimal Model Averaging for Divergent‐Dimensional Poisson Regressions,” Econometric Reviews 41 (2022): 775–805. [Google Scholar]
17. Chen L.‐P. and Yi G. Y., “Model Selection and Model Averaging for Analysis of Truncated and Censored Data With Measurement Error,” Electronic Journal of Statistics 14 (2020): 4054–4109. [Google Scholar]
18. Zhang X., Ma Y., and Carroll R. J., “MALMEM: Model Averaging in Linear Measurement Error Models,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 81 (2019): 763–779. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Nakamura T., “Proportional Hazards Model With Covariates Subject to Measurement Error,” Biometrics 48 (1992): 829–838. [PubMed] [Google Scholar]
20. Friedman J., Hastie T., and Tibshirani R., “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software 33 (2010): 1–22. [PMC free article] [PubMed] [Google Scholar]
21. Klosa J., Simon N., Westermark P. O., Liebscher V., and Wittenburg D., “Seagull: Lasso, Group Lasso and Sparse‐Group Lasso Regularization for Linear Regression Models via Proximal Gradient Descent,” Bioinformatics 21 (2020): 407. [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Lee S., Kwon S., and Kim Y., “A Modified Local Quadratic Approximation Algorithm for Penalized Optimization Problems,” Computational Statistics and Data Analysis 94 (2016): 275–286. [Google Scholar]
23. Boyd S. and Vandenberghe L., Convex Optimization (Cambridge University Press, 2004). [Google Scholar]
24. Lange K., Numerical Analysis for Statisticians (Springer, 2001). [Google Scholar]
25. Baslan T., Kendall J., Volyanskyy K., et al., “Novel Insights Into Breast Cancer Copy Number Genetic Heterogeneity Revealed by Single‐Cell Genome Sequencing,” eLife 9 (2020): e51480. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Turner N., Pearson A., Sharpe R., et al., “FGFR1 Amplification Drives Endocrine Therapy Resistance and is a Therapeutic Target in Breast Cancer,” Cancer Research 70 (2010): 2085–2094. [DOI] [PMC free article] [PubMed] [Google Scholar]
27. Olivier M., Hollstein M., and Hainaut P., “TP53 Mutations in Human Cancers: Origins, Consequences, and Clinical Use,” Cold Spring Harbor Perspectives in Biology 2 (2010): a001008. [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Lovejoy C. A., Xu X., Bansbach C. E., et al., “Functional Genomic Screens Identify CINP as a Genome Maintenance Protein,” Proceedings of the National Academy of Sciences of the United States of America 106 (2009): 19304–19309. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Wang P., Weng J., and Anderson R. G. W., “OSBP is a Cholesterol‐Regulated Scaffolding Protein in Control of ERK1/2 Activation,” Science 307 (2005): 1472–1476. [DOI] [PubMed] [Google Scholar]
30. Davidson C. D., Gillis N. E., and Carr F. E., “Thyroid Hormone Receptor Beta as Tumor Suppressor: Untapped Potential in Treatment and Diagnostics in Solid Tumors,” Cancers (Basel) 13 (2021): 4254. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Tsai L.‐J., Hsiao S.‐H., Tsai L.‐M., et al., “The Sodium‐Dependent Glucose Cotransporter SLC5A11 as an Autoimmune Modifier Gene in SLE,” Tissue Antigens 71 (2008): 114–126. [DOI] [PubMed] [Google Scholar]
32. Liu R. Z., Garcia E., Glubrecht D. D., Poon H. Y., Mackey J. R., and Godbout R., “CRABP1 is Associated With a Poor Prognosis in Breast Cancer: Adding to the Complexity of Breast Cancer Cell Response to Retinoic Acid,” Molecular Cancer 14 (2015): 129. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1. Supporting Information.

SIM-44-0-s001.pdf^{(181KB, pdf)}

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

[sim70163-bib-0001] 1. Bastani H., “Predicting With Proxies: Transfer Learning in High Dimension,” Management Science 67 (2021): 2964–2984. [Google Scholar]

[sim70163-bib-0002] 2. Chen A., Owen A. B., and Shi M., “Data Enriched Linear Regression,” Electronic Journal of Statistics 9 (2015): 1078–1112. [Google Scholar]

[sim70163-bib-0003] 3. Li S., Cai T. T., and Li H., “Transfer Learning for High‐Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 84 (2022): 149–173. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0004] 4. Tian Y. and Feng Y., “Transfer Learning Under High‐Dimensional Generalized Linear Models,” Journal of the American Statistical Association 118 (2023): 2684–2697. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0005] 5. Zhang L. and Zhang L., “Use of Autocorrelation Scanning in DNA Copy Number Analysis,” Bioinformatics 29 (2013): 2678–2682. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0006] 6. Chiang D. Y., Getz G., Jaffe D. B., et al., “High‐Resolution Mapping of Copy‐Number Alterations With Massively Parallel Sequencing,” Nature Methods 6 (2009): 99–103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0007] 7. Grasso C., Butler T., Rhodes K., et al., “Assessing Copy Number Alterations in Targeted, Amplicon‐Based Next‐Generation Sequencing Data,” Journal of Molecular Diagnostics 17 (2015): 53–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0008] 8. Carroll R. J., Ruppert D., Stefanski L. A., and Crainiceanu C. M., Measurement Error in Nonlinear Model (CRC Press Chapman and Hall, 2006). [Google Scholar]

[sim70163-bib-0009] 9. Kukush A., Schneeweis H., and Wolf R., “Three Estimators for the Poisson Regression Model With Measurement Errors,” Statistical Papers 45 (2004): 351–368. [Google Scholar]

[sim70163-bib-0010] 10. Sørensen Ø., Hellton K. H., Frigessi A., and Thoresen M., “Covariate Selection in High‐Dimensional Generalized Linear Models With Measurement Error,” Journal of Computational and Graphical Statistics 27 (2018): 739–749. [Google Scholar]

[sim70163-bib-0011] 11. Yang H., Pan F., Tong D., Brown H. E., and Liu J., “Measurement Error‐Tolerant Poisson Regression for Valley Fever Incidence Prediction,” IISE Transactions on Healthcare Systems Engineering 14 (2024): 305–317, 10.1080/24725579.2024.2370243. [DOI] [Google Scholar]

[sim70163-bib-0012] 12. Jiang F., Zhou Y., Liu J., and Ma Y., “On High‐Dimensional Poisson Models With Measurement Error: Hypothesis Testing for Nonlinear Nonconvex Optimization,” Annals of Statistics 51 (2023): 233–259. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0013] 13. Chen L.‐P., “De‐Noising Boosting Methods for Variable Selection and Estimation Subject to Error‐Prone Variables,” Statistics and Computing 33 (2023): 38. [Google Scholar]

[sim70163-bib-0014] 14. Chen L.‐P. and Qiu B., “SIMEXBoost: An R Package for Analysis of High‐Dimensional Error‐Prone Data Based on Boosting Method,” R Journal 15 (2023): 5–20. [Google Scholar]

[sim70163-bib-0015] 15. Zhang Q. and Yi G. Y., “Zero‐Inflated Poisson Models With Measurement Error in the Response,” Biometrics 79 (2023): 1089–1102. [DOI] [PubMed] [Google Scholar]

[sim70163-bib-0016] 16. Zou J., Wang W., Zhang X., and Zou G., “Optimal Model Averaging for Divergent‐Dimensional Poisson Regressions,” Econometric Reviews 41 (2022): 775–805. [Google Scholar]

[sim70163-bib-0017] 17. Chen L.‐P. and Yi G. Y., “Model Selection and Model Averaging for Analysis of Truncated and Censored Data With Measurement Error,” Electronic Journal of Statistics 14 (2020): 4054–4109. [Google Scholar]

[sim70163-bib-0018] 18. Zhang X., Ma Y., and Carroll R. J., “MALMEM: Model Averaging in Linear Measurement Error Models,” Journal of the Royal Statistical Society, Series B: Statistical Methodology 81 (2019): 763–779. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0019] 19. Nakamura T., “Proportional Hazards Model With Covariates Subject to Measurement Error,” Biometrics 48 (1992): 829–838. [PubMed] [Google Scholar]

[sim70163-bib-0020] 20. Friedman J., Hastie T., and Tibshirani R., “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software 33 (2010): 1–22. [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0021] 21. Klosa J., Simon N., Westermark P. O., Liebscher V., and Wittenburg D., “Seagull: Lasso, Group Lasso and Sparse‐Group Lasso Regularization for Linear Regression Models via Proximal Gradient Descent,” Bioinformatics 21 (2020): 407. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0022] 22. Lee S., Kwon S., and Kim Y., “A Modified Local Quadratic Approximation Algorithm for Penalized Optimization Problems,” Computational Statistics and Data Analysis 94 (2016): 275–286. [Google Scholar]

[sim70163-bib-0023] 23. Boyd S. and Vandenberghe L., Convex Optimization (Cambridge University Press, 2004). [Google Scholar]

[sim70163-bib-0024] 24. Lange K., Numerical Analysis for Statisticians (Springer, 2001). [Google Scholar]

[sim70163-bib-0025] 25. Baslan T., Kendall J., Volyanskyy K., et al., “Novel Insights Into Breast Cancer Copy Number Genetic Heterogeneity Revealed by Single‐Cell Genome Sequencing,” eLife 9 (2020): e51480. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0026] 26. Turner N., Pearson A., Sharpe R., et al., “FGFR1 Amplification Drives Endocrine Therapy Resistance and is a Therapeutic Target in Breast Cancer,” Cancer Research 70 (2010): 2085–2094. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0027] 27. Olivier M., Hollstein M., and Hainaut P., “TP53 Mutations in Human Cancers: Origins, Consequences, and Clinical Use,” Cold Spring Harbor Perspectives in Biology 2 (2010): a001008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0028] 28. Lovejoy C. A., Xu X., Bansbach C. E., et al., “Functional Genomic Screens Identify CINP as a Genome Maintenance Protein,” Proceedings of the National Academy of Sciences of the United States of America 106 (2009): 19304–19309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0029] 29. Wang P., Weng J., and Anderson R. G. W., “OSBP is a Cholesterol‐Regulated Scaffolding Protein in Control of ERK1/2 Activation,” Science 307 (2005): 1472–1476. [DOI] [PubMed] [Google Scholar]

[sim70163-bib-0030] 30. Davidson C. D., Gillis N. E., and Carr F. E., “Thyroid Hormone Receptor Beta as Tumor Suppressor: Untapped Potential in Treatment and Diagnostics in Solid Tumors,” Cancers (Basel) 13 (2021): 4254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[sim70163-bib-0031] 31. Tsai L.‐J., Hsiao S.‐H., Tsai L.‐M., et al., “The Sodium‐Dependent Glucose Cotransporter SLC5A11 as an Autoimmune Modifier Gene in SLE,” Tissue Antigens 71 (2008): 114–126. [DOI] [PubMed] [Google Scholar]

[sim70163-bib-0032] 32. Liu R. Z., Garcia E., Glubrecht D. D., Poon H. Y., Mackey J. R., and Godbout R., “CRABP1 is Associated With a Poor Prognosis in Breast Cancer: Adding to the Complexity of Breast Cancer Cell Response to Retinoic Acid,” Molecular Cancer 14 (2015): 129. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Transfer Learning for Error‐Contaminated Poisson Regression Models

Jou‐Chin Wu

Li‐Pang Chen

ABSTRACT

1. Introduction

2. Notation and Models

2.1. Data Structure and Regression Models

2.2. Measurement Error Model

3. Methodology

3.1. Correction of Measurement Error

3.2. Transfer Learning With Measurement Error in Responses

3.2.1. Estimation via Transfer Learning

3.2.2. Detection of Source Data

3.3. Computational Implementation via Local Quadratic Approximation

4. Estimation With Unknown Parameters in Measurement Error Models

5. Model Averaging

6. Numerical Studies

6.1. Simulation Setup

Scenario I

Scenario II

6.2. Simulation Results

TABLE 1.

7. Real Data Analysis

7.1. Data Description

TABLE 2.

7.2. Analysis Results

TABLE 3.

TABLE 4.

TABLE 5.

TABLE 6.

TABLE 7.

8. Summary

Conflicts of Interest

Supporting information

Acknowledgments

Data Availability Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases