SUMMARY
Tissue Microarrays (TMAs) measure tumor-specific protein expression via high-density immunohistochemical staining assays. They provide a proteomic platform for validating cancer biomarkers emerging from large-scale DNA microarray studies. Repeated observations within each tumor result in substantial biological and experimental variability. This variability is usually ignored when associating the TMA expression data with patient survival outcome. It generates biased estimates of hazard ratio in proportional hazards models. We propose a Latent Expression Index (LEI) as a surrogate protein expression estimate in a two-stage analysis. Several estimators of LEI are compared: an Empirical Bayes (EB), a Full Bayes (FB), and a Varying Replicate Number (VRN) estimator. In addition, we jointly model survival and TMA expression data via a shared random effects model. Bayesian estimation is carried out using a Markov Chain Monte Carlo (MCMC) method. Simulation studies were conducted to compare the two-stage methods and the joint analysis in estimating the Cox regression coefficient. We show the two-stage methods reduce bias relative to the naive approach, but still lead to under-estimated hazard ratios. The joint model consistently outperforms the two-stage methods in terms of both bias and coverage property in various simulation scenarios. In case studies using prostate cancer TMA data sets, the two stage methods yields a good approximation in one data set while an insufficient one in the other. A general advice is to use the joint model inference whenever results differ between the two-stage methods and the joint analysis.
Keywords: Biomarker, Empirical Bayes, Joint modeling, Mixed effects, Tissue microarray, Varying
1. Introduction
DNA microarray technology has enabled expression measurement of thousands of genes simultaneously, providing a platform for rapid screening of genomic biomarkers in cancer. Translating these discovery-type findings into clinical relevance is a more challenging task. Gene expression profiling studies using spotted cDNA arrays or Affymetrix GeneChip arrays often yield a few hundred candidate cancer genes that are associated with a phenotype. Only a small portion of these will be eventually validated for the corresponding expression changes at the protein level. The advent of Tissue Microarray (TMA) technology has provided a proteomic platform for such validation studies to find clinically useful biomarkers. TMA experiments measure tumor-specific protein expression via high-density immunohistochemical staining assays, allowing simultaneous evaluation of hundreds of patient samples in a single experiment [1]. Since their initial development, TMA-based expression studies have quickly become an integral part of cancer biomarker development [2-4].
A typical tissue array slide comprises up to 1000 tiny biopsy tissue elements, which we will refer to as cores, with multiple cores corresponding to repeated sampling from the same tumor. Expression measures on these replicate cores constitute the TMA core-level data. These can display substantial within-subject variability for both biological and experimental reasons. Biologically, for tumors that are highly infiltrative and heterogeneous in nature (e.g., prostate tumors), protein expression pattern can be quite variable. For example, cell proliferation genes often exhibit localized high expression within a tumor, indicating elevated aggressiveness and metastatic potential in the corresponding areas. Replicate sampling from various regions of the tumor is therefore important in capturing the underlying heterogeneity within a tumor. Experimental sources of the variability can come from a combination of probe affinity, measurement imprecision, and further missing data due to insufficient sampling. Without appropriately accounting for these variabilities, the noise-prone measurements tend to attenuate the prognostic value of a potential biomarker in predicting disease outcome. The lack of a model-based approach for TMA core-level expression data to effectively model the intra-tumor variation has motivated us to carry out a full investigation.
A good analogy for understanding TMA data structure is from probe-level data generated by the Affymetrix GeneChip arrays. GeneChip arrays measure gene expression at the mRNA transcripts level. The probe-level data refer to the replicate expression measures on a set of 16-20 small oligonucleotide probe sets derived for a target gene. The biological variation comes primarily from these oligonucleotide probe sequence variants, while experimental variations arise during the process of slide printing, hybridization and optical reading. Li et al. [6] reported that the variation of a specific probe across multiple arrays could be considerably smaller than the variance across probes within a probe set. Modeling Affymetrix probe-level data has generated much attention [6, 7] as the technology has become more mature and widely used.
Similarly in TMA experiments, modeling within-tumor protein expression heterogeneity is an important problem. In these tissue-based experiments, the variation across core samples within a tumor can be substantially larger than the variation observed across subjects. Etzioni et al. [5] used a compositional analysis to model such heterogeneity, and compared the proportion of cells stained at different intensity levels between normal and tumorous tissues. In this study, we focus on the effect of modeling intra-tumor variation in the context of predicting patient survival outcome. In a latent variable modeling framework, we assume that an underlying `true' expression value predicts survival. In real experiments, this `true' expression can not be precisely measured due to sampling variabilities and measurement imprecision. Instead, we observe the core-level expression measurements that are subject to these measurement errors. In a two-stage method, we adapt ideas from measurement error modeling and propose a latent expression index (LEI) to approximate the underlying true value, and focus on its behavior in proportional hazards models. We adapt an empirical Bayes estimator [8] to 1) incorporate important clinical parameters such as Gleason score and pathological stage of the tumor and 2) adjust for the varying number of cores. We further establish a joint model for TMA core-level data and survival outcome via a shared random effect. There is a large literature on joint modeling of longitudinal data and survival [9-16]. These methods have been developed predominantly for modeling survival and CD4 counts in AIDS patients; here their application to Tissue Microarray data in cancer biomarker studies is novel. Using both simulations and two published TMA data sets, we compare the performances of the naive, two-stage LEI, and the joint model approach in obtaining parameter estimates and associated inference.
The article is organized as follows. Section 2 specifies notation and models for the TMA core-level expression data and for the patient survival data. Section 3 introduces LEI and its use in a two-stage method. Section 4 presents the joint modeling approach and the Bayesian estimation framework. Simulation results to compare the performances of these methods are then discussed in Section 5. Case studies using two prostate cancer TMA data sets are presented in Section 6. Further discussion can be found in Section 7.
2. Model specification
Measurement model for the TMA core-level data
Let be the latent expression value for a biomarker in tumor i, i = 1, … , n. Assuming the observed TMA core-level measurement for the jth core in the ith tumor is
(2.1) |
where we assume . The mean μx* is a linear function of clinical covariates: , where Zi = (Z1i, Z2i, …,Zpi) constitutes a row vector of p clinical parameters characterizing histologic and pathologic features of the tumor; and θ = (θ1, … , θp) is the associated p-dimensional row vector of effect sizes. In this model, Uij represents the variation of the ri core-level expression measurements about the “true” value . We assume Uij is i.i.d. and independent of .
Survival model
Let the observed survival time for patient i (i = 1, …, n) be Ti = min(Yi, Ci), where Yi is the time from diagnosis to disease recurrence; Ci is the time to censoring which is independent of Yi, and δi = 1{Yi < Ci} is the censoring indicator. Under the Cox proportional hazards model, the hazard rate for patient i is
(2.2) |
where λ0(t) is the baseline hazard function, β* is the regression coefficient associated with the `true' protein expression, and b = (b1, … , bp) is the row vector of coefficients associated with the vector of clinical variables . We also consider a parametric Weibull regression model with the following form for the hazard function:
(2.3) |
3. Two-stage plug-in method
Given the rare disease assumption and the assumption that the measurement error Uij has no predictive value, i.e., Prentice (1982) introduced the induced hazard rate function for Cox regression model with covariates measured with error. When normality is assumed for the error distribution, a first-order bias-correction method is to replace with its conditional mean
(3.1) |
and estimate β* by maximizing the corresponding partial likelihood. Here denotes the vector of observed expression measures for tumor i. Note that the regression calibration model in (3.1) is an approximation to the true model in (2.2) due to the rare disease assumption and the omission of a higher order term implicit in (2.2). Details of such first-order bias-correction methods can be found in [17, 18]. Define the Latent Expression Index (LEI) to be an estimate of the conditional mean, , for each subject i. A two-stage plug-in method can be described by the following algorithm: 1) Compute LEIi (i = 1, …, n) as a surrogate expression estimate that adjusts for measurement error; and 2) Apply the Cox or Weibull regression model using LEIi to obtain an estimate of β* and the associated standard error.
In the next section, we describe methods for computing LEI for tissue microarray data. These include an empirical Bayes estimator conditional on clinical covariates, a full Bayes approach and a Varying Replicate Number (VRN) method as an extension to adjust for the number of cores per tumor.
3.1. Methods for computing LEI
3.1.1. The Empirical Bayes and full Bayes estimator
Express (2.1) as a mixed effects model
(3.2) |
where . To make a connection between (3.2) and (2.1), is the unobserved “true” protein expression level, which can be expressed as . Subsequently, νi, which is normally distributed with mean zero and variance, has the interpretation of a mean centered “true” protein expression level. The empirical Bayes estimator can then be derived as
(3.3) |
where is the attenuation factor [18]. Parameter estimates can be obtained by fitting a mixed effects model as described in (3.2), using a restricted maximum likelihood (REML) approach [19, 20].
The empirical Bayes estimator conditions on the set of parameter estimates derived from the data. The uncertainty of these estimates are not accounted for in LEIeb. For this reason, we also consider a full Bayesian estimator, , where standard conjugate hyperprior distributions are adopted: . The full Bayes estimator
is then based on the posterior inference from model (3.2) where are the posterior means given the data.
3.1.2. The Varying Replicate Number (VRN) method
In a typical TMA construction, Ki cores are placed on the array for each tumor i. However, not all of the measurements are available. Several reasons contribute to a varying number of replicate measure. These include heterogeneous tissue composition and technical defects such as image corruption. The expression measurement from non-tumorous tissue types or a corrupted image is typically considered unsuitable for an outcome analysis and excluded. Let Mij = 0, j = 1, … ,Ki indicate that the jth core from the ith tumor is lost due to the aforementioned reasons and Mij = 1 if it is available. Expression measures are retained for cores, where ri varies across tumor samples and possibly depends on covariate Zi. We consider ri to follow a Binomial distribution given Ki and P(Mij = 1) with possible over-dispersion. Assuming independence and homogeneity across j for Mij, we adopt a logistic mixed effects model:
(3.4) |
where , Zi = (Z1i, Z2i, …,Zgi) is the vector of g clinical covariates that can be the same or different from those in (3.2), and ψ = (ψ1, ψ2, …, ψg) is the associated vector of coefficients. Therefore
(3.5) |
The expression index under the VRN model is then derived by averaging over all the possible values of (ri, Ki). In particular,
(3.6) |
Here R is taken to be maxi Ki. An additional assumption for the above is that the expression measures do not correlate with ri or Ki. Parameter estimates can be obtained by fitting a mixed effects model as described in (3.2). A logistic mixed effects model in the form of (3.4) was fitted to obtain . Estimation is via methods described in [21] and [22]. The empirical proportions were used for .
In a relatively balanced TMA array where the number of replicate measures ri does not vary much across subjects, is an approximately constant adjustment factor. The amount of shrinkage in LEIeb toward the overall mean depends primarily on the ratio of the within- to between-subject variation in that particular data set. In our example, however, ri is a highly variable quantity. It exerts a larger role in determining how much weight gives to a particular subject's data relative to the estimated population mean. The motivation for LEIvrn is to provide a replicate number-averaged expression estimate that alleviates the variability induced by ri in the empirical Bayes estimator.
4. Joint Modeling of survival and TMA core-level data
The two-stage approaches described above are attractive for their simplicity and straightforward interpretation. They require minimal computation and can be easily implemented using existing statistical packages. However, there are major limitations for the two-stage method [23]. First, the two-stage method involves a first order approximation and ignores the second-order term in the induced hazard rate function (3.1). As we will illustrate in the simulation study, such approximation works well when β* is close to zero, but otherwise lead to sizeable bias in . Second, parameter estimates in the second stage do not account for the uncertainty in estimating LEI in the first stage. The associated standard error for will be over-optimistic. Given these considerations, it is desirable to make inference based on the joint likelihood of the failure time and TMA expression data. In this article, we adopt a shared random effect model to induce correlation between the TMA data and the survival outcome.
Given the measurement model specified in (3.2) for the TMA data, we write the proportional hazards model for the survival outcome as
(4.1) |
The parameter νi constitutes a shared random effect that connects the measurement model (3.2) and the survival outcome model (4.1). The expression data and survival times are then assumed to be independent given νi. The joint likelihood of the observed data {Ti, δi, Xi and the random effects νi is therefore
(4.2) |
where
(4.3) |
We used a piecewise constant hazards model in which the time axis is partitioned into L disjoint intervals, I1, …, IL, where Il = [al-1, al) with a0 < ti and aL > ti for all i = 1, …, n. Assume a constant baseline hazard in the lth interval, λ0(t) = λl for t ∈ Il. Rl is the set at risk at the beginning of interval l; dl is the number of failures in interval l; and Δil = min(ti, al) - al-1.
Alternatively, a parametric Weibull model can be assumed for the survival outcome using the following hazard function:
(4.4) |
When γ = 1, the above reduces to exponential distribution with constant failure rate . The survival time component of the joint likelihood in (4.3) is then replaced by
(4.5) |
In a Bayesian estimation framework, the following prior distributions are specified for the model parameters:
(4.6) |
The prior choices in (4.6) render proper noninformative prior distributions with the hyperparameters chosen to achieve large variances (wide spread). In particular, . The values of r0 and γ0 are set to be small to impose noninformativeness (although there have been criticisms that the resulting posterior distributions remain highly sensitive to the choice of r0 and γ0). We had tested different sets of values for r0 and γ0: (0.1, 0.1), (0.01, 0.01), and (0.001, 0.001). When the sample size is moderate to large, the variance estimates, and , are well-behaved and not significantly affected by the choice of r0 and γ0. Samples from the posterior distribution are obtained using Markov Chain Monte Carlo (MCMC) methods. In the likelihood function of (4.2), the Bayesian estimation procedure treats νi as missing data and imputes it at each MCMC iteration via posterior sampling from the conditional distribution of νi given the rest of the variables. One then proceeds with the likelihood function in (4.2) as if νi were observed.
5. Simulation
5.1. Simulation Setup
The additive measurement error model in (3.2) with one covariate Z1i is used to simulate the expression measure Xij, i = 1, …, n, and j = 1, …, ri. In this model, θ0 = 0 and θ1 = 1. Furthermore, νi ~ N(0, 1), Uij ~ N(0, 0.5). The covariate Z1i is simulated from a N(0, 1) distribution. The total number of cores sampled, Ki, takes values in {1, 2, …, 12} with P(Ki = 6) = 0.4, P(Ki = 1) = … = P(Ki = 5) = 0.1, and P(Ki = 7) = … = P(Ki = 12) = 1/60, mimicking the proportions from the actual tissue array data set used in this study. We simulate the number of repeated measures from a Binomial(Ki, pi), where pi = 1 - π1/Ki such that the missing proportion equals π. The survival time Ti is simulated from a proportional hazards model in the form of (2.2) with λ0(t) ≡ 1 and β* = 1 or 2. An additional covariate Z1i is further assumed to associate with Ti with the coefficient being one. The censoring time is simulated from an independent exponential distribution that results in a 30% censoring proportion. Results are summarized over 100 such simulated data sets each of a sample size n = 200. In general, we assigned parameter values in the simulation to mimic those for the real data sets.
Computation of LEIeb, LEIvrn were carried out using the PROC MIXED and the IML procedure in SAS (SAS Institute, Cary, NC). LEIfb and the joint models were implemented using OpenBUGS via the R interface BRugs [24,25]. We ran two chains with 1000 burn-in and 1000 updates per chain for the MCMC convergence.
5.2. Simulation Results
The simulation results are summarized in Table I. For β* = 1 in the survival models, the naive approach (using as a surrogate expression) attenuates the true effect size by around 25%. The coverage probability of a nominal 95% confidence interval of is 0.10 at best. The two stage methods (LEI) achieved a considerable bias correction by adjusting for the measurement error in the LEI imputation. The joint modeling approach gives the best estimate and a coverage probability of 95% compared to the truth.
Table I.
coverage | Coverage | ||||||||
---|---|---|---|---|---|---|---|---|---|
Weibull |
Proportional Hazards |
||||||||
β* = 1 | X* | 1.01 | 0.08 | 0.08 | 0.96 | 1.01 | 0.08 | 0.08 | 0.97 |
Naive | 0.75 | 0.07 | 0.08 | 0.070 | 0.75 | 0.07 | 0.08 | 0.09 | |
LEIeb | 0.93 | 0.08 | 0.09 | 0.85 | 0.93 | 0.08 | 0.09 | 0.87 | |
LEIfb | 0.93 | 0.08 | 0.09 | 0.85 | 0.93 | 0.08 | 0.09 | 0.88 | |
LEIvrn | 0.97 | 0.09 | 0.10 | 0.89 | 0.96 | 0.09 | 0.10 | 0.87 | |
Joint Model | 1.03 | 0.11 | 0.11 | 0.95 | 1.03 | 0.11 | 0.11 | 0.95 | |
β* = 2 | X* | 2.05 | 0.12 | 0.12 | 0.95 | 2.03 | 0.11 | 0.10 | 0.94 |
Naive | 1.13 | 0.08 | 0.10 | 0 | 1.18 | 0.08 | 0.11 | 0 | |
LEIeb | 1.51 | 0.11 | 0.13 | 0.03 | 1.58 | 0.11 | 0.13 | 0.06 | |
LEIfb | 1.48 | 0.11 | 0.12 | 0 | 1.55 | 0.110 | 0.12 | 0.04 | |
LEIvrn | 1.62 | 0.11 | 0.14 | 0.19 | 1.70 | 0.11 | 0.15 | 0.30 | |
Joint Model | 2.16 | 0.27 | 0.30 | 0.89 | 2.07 | 0.19 | 0.20 | 0.93 |
Next a larger effect size is simulated (β* = 2). The bias in the two-stage approaches due to the first-order approximation is evident. Overall the two-stage methods generate less biased compared to the naive estimate. Nevertheless, these are 15-25% smaller than the true β*. The coverages are poor for the two-stage approaches. The joint modeling approach should be advocated in this scenario for inference. Notice that the joint model estimates of β* are slightly bigger than 2, especially under the Weibull distribution. This may be driven by the prior distributions adopted for the parameters under the Bayesian estimation scheme. We have observed that such a difference disappears when the sample size gets larger.
6. Case study in prostate cancer
6.1. Data description
In this study, we consider two prostate tumor tissue microarray data sets. The α-Methylacyl CoA racemase (AMACR) is a peroxisomal and mitochondrial enzyme that plays an important role in fatty acid metabolism. AMACR has been shown to consistently overexpress in prostate tumors [26]. Rubin et al. [4] profiled AMACR protein expression using a TMA constructed on 203 prostate tumors from a surgical cohort who underwent radical prostatectomy at the University of Michigan as a primary therapy for clinically localized prostate cancer diagnosed between 1994 and 1998. They found AMACR is a significant predictor of the Prostate Specific Antigen (PSA) failure in these 203 patients.
The second biomarker evaluated in this study is BM28. BM28 encodes a highly conserved mini-chromosome maintenance protein (MCM) that is involved in the initiation of genome replication. Bismar et al. [27] profiled a total of 41 genes (including BM28) in a TMA-based proteomic study. They identified a 12-gene model showing the expression combination of the twelve genes significantly associates with tumor progression and PSA failure in a set of 79 men following surgery for clinically localized prostate cancer. The expression level of BM28 however did not show significant prognostic value in their analysis. We chose BM28 to evaluate the possibility of its being a false negative biomarker due to measurement error.
For the AMACR data, an average of Ki =5.5 (range: 2 to 12) tissue core specimens were taken from each tumor sample and put on a tissue array for immunohistochemical staining. After initial diagnostic evaluation of each core, an average of 29% (range: 0-86%) of the cores were excluded due to reasons discussed earlier, leading to 5.5% missing subjects. The BM28 data has a similar tissue array design. A quantitative imaging analysis of the staining intensity was obtained using the ACIS II (Chromavision, San Juan Capistrano, CA) system. The intensity level ranges from 0 to 255 chromogen intensity units, and is transformed using the natural logarithm (one unit added to avoid taking logarithm of 0) and normalized to have mean zero and standard deviation one. Disease recurrence is defined as a serum PSA increase >0.2ng/mL after radical prostatectomy. Censored observations are those free of the recurrence at the time of last follow-up.
6.2. Measurement error and regression attenuation in TMA data analysis
Figure 1 illustrates a considerable amount of measurement error in the core-level expression data from the two TMA experiments. In the AMACR data, methods-of-moments estimates of the variance components are and . In the BM28 data, the methods-of-moments estimates of the variance components are and . The within-subject variation is almost three times the between-subject variation.
In a Cox proportional hazards model context, we did a simple simulation where , β* = 1. A naive estimator —the average core-level expression for tumor i— is used as the surrogate expression to replace in (2.2). We simulate situations where measurement error is small , moderate , and large . Figure 2 shows various degrees of regression attenuation in the estimate of β* as a function of replicate number ri and the amount of error . When the parameter values are set to resemble the AMACR data, the naive estimate of β* is approximately 30% smaller than the true value. With the current TMA construction protocol specifying three cores per subject due to economic and tissue-preservation reasons, and a great amount of within-subject variability routinely observed in the core-level expression data, Figure 2 effectively conveys the importance of modeling measurement error in TMA data. In the following two sections, we implement the measurement error models to demonstrate how statistical inference differs from the previous results.
6.3. AMACR expression and biochemical recurrence in prostate cancer
In prostate cancer, Gleason score, pathologic stage and tumor size are among the most important clinical parameters. We include these as clinical covariates Zi to adjust in the measurement model (3.2), the replicate number model (3.4), and the survival outcome model (4.1). In the measurement model, with an associated standard error of 0.13, indicating a marginal association of tumor size with AMACR expression level. In the replicate number model, with an associated standard error of 0.16, which is consistent with our expectation that a larger tumor sample provides more abundant number of cores.
Table II lists the estimates and associated standard errors (posterior standard deviation) of β* in the outcome model. The measurement error adjustment has significantly improved upon the naive estimate. The error-adjusted is around 0.75 , approximately 31% larger in absolute value than the naive estimate which is 0.57 . The amount of attenuation in β* is quite consistent with what we conclude from the simulated datasets in the previous section. In this dataset, the two-stage methods (LEIeb, LEIfb, LEIvrn) perform equally well as the joint modeling approach. The simplicity and computational efficiency of LEI serves as a satisfactory core-level expression index for AMACR. However as mentioned earlier, the two-stage methods are based on a first-order approximation, the accuracy of which is largely driven by the size of β* and the ratio of within- and between-subject variation. As will be shown in the other data example, the two-stage methods will not be always a suitable approach.
Table II.
Weibull |
Proportional hazards |
||||
---|---|---|---|---|---|
AMACR(n=203) | Naive | -0.573 | 0.201 | -0.571 | 0.198 |
LEIeb | -0.761 | 0.266 | -0.751 | 0.266 | |
LEIfb† | -0.753 | 0.262 | -0.752 | 0.263 | |
LEIvrn | -0.735 | 0.260 | -0.737 | 0.258 | |
Joint modeling† | -0.742 | 0.268 | -0.745♭ | 0.268 |
Weibull |
Proportional hazards |
||||
---|---|---|---|---|---|
BM28 (n=52) | Naive | 0.828 | 0.414 | 0.900 | 0.399 |
LEIeb | 1.051 | 0.519 | 1.146 | 0.494 | |
LEIfb† | 1.457 | 0.564 | 1.493 | 0.504 | |
LEIvrn | 1.034 | 0.552 | 1.151 | 0.537 | |
Joint Modeling† | 1.753 | 0.766 | 1.592♭ | 0.600 |
- Estimation is based on MCMC methods, where we sampled 2 chains each with 10000 burn-in and 10000 updates.
We used J = 5 and J = 3 intervals for the piecewise exponential distribution in the AMACR and BM28 dataset respectively.
The estimate of the Weibull shape parameter is = 0.71, and = 0.97 from the joint modeling for the AMACR and BM28 dataset respectively.
Kaplan-Meier curves are useful as a graphical representation of the prognostic value of a biomarker. We examined these plots by dividing the subjects into different risk groups based on the values of AMACR expression estimates derived under each method. In Figure 3(a), subjects with AMACR high, median, and low expression groups based on LEIvrn and the joint model estimates (C and D respectively) are significantly better separated in terms of probability of recurrence-free survival, when compared to that using the naive mean estimates (A).
6.4. BM28 expression and biochemical recurrence in prostate cancer
The measurement model indicates a marginal association of pathologic stage of the tumor with BM28 expression: . The replicate number model again suggests a strong dependence on the size of the tumor: .
In Table II, the differences under various models are more discernable in this datasets. First, the Weibull and piecewise exponential model overall generate slightly different results given a small sample size (n=52). Second, the empirical Bayes estimate differs substantially from the full Bayes LEI estimate. It is likely due to the large uncertainties in the parameter estimates that determine LEIeb. Finally, the bias introduced by the first-order approximation is prominent here. Both the coefficient β* and the noise ratio in this dataset are much larger in magnitude compared to the AMACR data example. In this case, the two-stage methods alleviate regression attenuation, only to a limited extent. The joint model should be used for parameter estimation and associated inference.
Figure 3(b) plots the Kaplan-Meier curves using different expression estimates. The 5-year PSA recurrence-free survival probability is 0.95 versus 0.58 for low and high BM28 expression estimated by the joint model. Adjusting for measurement error in this dataset has made a dramatic change in the conclusion about the prognostic value of BM28, compared to the naive method.
6.5. Improved expression estimates
Figure 4 compares the naive and the joint model expression estimates, plotted against the survival time on the x-axis. The mean expression ± two standard deviations is plotted for each individual. Two improvements under the joint model are clear: 1) the noise, represented by the error bars, is greatly reduced via the joint modeling, and 2) the mean expression levels are distributed more tightly around a regression line, accentuating the relationship of the TMA expression data and survival time.
7. Discussion
In TMA data analysis, statistical methods often focus on downstream models in predicting disease outcome assuming is a suffcient expression summary measure [29, 30]. Relatively little attention has been given to the modeling of within-tumor variation in these TMA experiments. As we have shown in this paper with real data examples, analysis ignoring intra-tumor variation can lead to false negative results, causing a tremendous waste of valuable tissue resource and experimental costs. In this study, we propose and compare a two-stage approach and a joint model to analyze tissue microarray data in a measurement error framework. The two-stage approach computes a Latent Expression Index (LEI) in the first stage and then associate the LEI with patient survival information using a proportional hazards model in the second stage. We further adjust for clinical/pathological covariates Zi and the number of repeated measures (Ki, ri) per each tumor to improve the effciency of the expression estimates.
Although the two-stage approach is attractive for its simplicity, interpretability and straightforward computation, practitioners need to be aware of the associated statistical limitations. As we discussed earlier, the regression calibration approach is an approximation of the true model which reduces bias relative to the naive method but still leads to underestimated β*. In addition, the survival information is not used in the first stage to compute LEI, which can cause bias and effciency loss in estimating β* in the second stage. Finally, the uncertainty of estimating the LEI quantity is not assimilated in the second stage, leading to over-optimistic standard error estimates. For a review of this topic, see Tsiatis and Davidian [23]. In simulation studies, we show the two-stage methods reduce bias relative to the naive approach in estimating the Cox regression coefficient, but still lead to under-estimated β*. The joint analysis, however, outperforms the two-stage methods in terms of the bias and coverage property in various simulated scenarios. In case studies, we applied the error methods to prostate TMA data sets where the two-stage approach and the joint analysis generate 1) very similar results in the AMACR data set and 2) disparate results in the BM28 data set. While the first case study demonstrates a scenario where the two-stage approach is a good approximation to the true model, the second suggests one where such approximation is not so accurate (the higher order term can not be ignored when β* is a large value). Inference should be based on the joint analysis whenever results differ.
A final notion is regarding the impact of the technology advancement on TMA data analysis. Recent advances in quantitative assessment of the immunohistochemical staining provide precise, objective, and reproducible protein expression measurements. Compared to the conventional pathologist scoring on an ordinal scale, the Chromavision system used in our data examples enables quantification of the antigen level on a continuous scale, free of the subjectivity associated with pathologist-based visual scoring system. We should point out that the Chromavision system not only quantifies the amount of protein expression, but also provides a more accurate measure of the percentage of tissue that express the protein. A statistical problem of interest is to extend our model to incorporate the percentage information when summarizing the expression profile of a tumor. Another quantitative system, AQUA [28], which stands for Automated Quantitative Analysis, measures fluorescence signals, leading to higher sensitivity to very low antibody concentrations. In addition, it allows the separation of tumor from stromal elements and the sub-cellular localization of signals for a co-localization of the antigens in different cell compartments. As the technology is becoming more and more refined, statistical models underpinning both biological and experimental issues are in great need.
Acknowledgement
We thank Dr. Arul Chinnaiyan at the Pathology and Urology Department at the University of Michigan Medical School and Dr. Mark Rubin at the Pathology Department of Weill Medical College of Cornell University for making the prostate tissue microarray datasets available.
REFERENCES
- 1.Kononen J, Bubendorf L, Kallioniemi A, Barlund M, Schraml P, et al. Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nature Medicine. 1998;4:844–7. doi: 10.1038/nm0798-844. [DOI] [PubMed] [Google Scholar]
- 2.Divito KA, Berger AJ, Camp RL, Dolled-Filhart M, Rimm DL, Kluger HM. Automated quantitative analysis of tissue microarrays reveals an association between high bcl-2 expression and improved outcome in melanoma. Cancer Research. 2004;64:8773–7. doi: 10.1158/0008-5472.CAN-04-1387. [DOI] [PubMed] [Google Scholar]
- 3.Seligson DB, Horvath S, Shi T, Yu H, Tze S, Grunstein M, Kurdistani SK. Global histone modification patterns predict risk of prostate cancer recurrence. Nature. 2005;435:1262–6. doi: 10.1038/nature03672. [DOI] [PubMed] [Google Scholar]
- 4.Rubin MA, Bismar TA, Andrén O, Mucci L, Kim R, Shen R, Ghosh D, Wei J, Chinnaiyan AM, Adami H, Kantoff PM, J. J. Decreased α-Methylacyl CoA racemase expression in localized prostate cancer is associated with an increased rate of biochemical recurrence and cancer-specific death. Cancer Epidemiology and Biomarkers Prevention. 2005;14:1424–31. doi: 10.1158/1055-9965.EPI-04-0801. [DOI] [PubMed] [Google Scholar]
- 5.Etzioni R, Hawley S, Billheimer D, True LD, Knudsen B. Analyzing patterns of staining in immunohistochemical studies: application to a study of prostate cancer recurrence. Cancer Epidemiology and Biomarkers Prevention. 2005;14:1040–6. doi: 10.1158/1055-9965.EPI-04-0584. [DOI] [PubMed] [Google Scholar]
- 6.Li C, Wang WH. Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Sciences. 2001;98:31–6. doi: 10.1073/pnas.011404098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix genechip probe level data. Nucleic Acids Research. 2003;31:1–8. doi: 10.1093/nar/gng015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tsiatis AA, DeGruttola V, Wulfsohn MS. Modeling the relationship of survival to longitudinal data measured with error. applications to survival and CD4 counts in patients with AIDS. Journal of the American Statistical Association. 1995;90:27–37. [Google Scholar]
- 9.Wulfsohn MS, Tsiatis AA. A joint modeling for survival and longitudinal data measured with error. Biometrics. 1997;53:330–39. [PubMed] [Google Scholar]
- 10.Faucett CJ, Thomas DC. Simultaneously modeling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine. 1996;15:1663–85. doi: 10.1002/(SICI)1097-0258(19960815)15:15<1663::AID-SIM294>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]
- 11.Henderson R, Diggle P, Dobson A. Joint modeling of longitudinal measurements and event time data. Biostatistics. 2000;1:465–80. doi: 10.1093/biostatistics/1.4.465. [DOI] [PubMed] [Google Scholar]
- 12.Wang Y, Taylor JMG. Joint modeling longitudinal and event time data with application to acquired immunodeficiency syndrome. Journal of the American Statistical Association. 2001;96:895–905. [Google Scholar]
- 13.Xu J, Zeger SL. Joint analysis of longitudinal data comprising repeated measures and times to events. Applied Statistics. 2001;50:375–87. [Google Scholar]
- 14.Brown ER, Ibrahim JG. A Bayesian semiparametric joint hierarchical model for longitudinal and survival data. Biometrics. 2003;59:221–28. doi: 10.1111/1541-0420.00028. [DOI] [PubMed] [Google Scholar]
- 15.Guo X, Carlin B. Separate and joint modeling of longitudinal and event time data using standard computer packages. The American Statistician. 2004;58:1–9. [Google Scholar]
- 16.Tadesse MG, Ibrahim JG, Gentleman R, Chiaretti S, Ritz J, Foa R. Bayesian error-in-variable survival model for the analysis of genechip arrays. Biometrics. 2005;61:488–97. doi: 10.1111/j.1541-0420.2005.00313.x. [DOI] [PubMed] [Google Scholar]
- 17.Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982;69:331–42. [Google Scholar]
- 18.Carroll RJ, Ruppert D, Stefanski LA. Measurement error in nonlinear models. Chapman and Hall; 1995. [Google Scholar]
- 19.Harville DA. Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association. 1977;72:320–38. [Google Scholar]
- 20.Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–74. [PubMed] [Google Scholar]
- 21.Lindstrom MJ, Bates DM. Nonlinear mixed effects models for repeated measures data. Biometrics. 1990;46:673–87. [PubMed] [Google Scholar]
- 22.McCulloch CE. Maximum likelihood variance components estimation for binary data. Journal of the American Statistical Association. 1994;89:330–35. [Google Scholar]
- 23.Tsiatis AA, Davidian M. Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica. 2004;14:809–34. [Google Scholar]
- 24.Spiegelhalter D, Thomas A, Best N, Lunn D. WinBUGS 1.4 Manual. 2003 Http://www.mrcbsu.cam.ac.uk/bugs/winbugs/contents.shtml. [Google Scholar]
- 25.Thomas A. BRugs: An R interface to OpenBUGS. Version 1.0 User's Manual. 2004 Http://mathstat.helsinki.fi/openbugs/ [Google Scholar]
- 26.Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM. Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Research. 2002;62:4427C33. [PubMed] [Google Scholar]
- 27.Bismar T, Demichelis F, Riva A, Kim R, Varambally S, He L. Defining aggressive prostate cancer using a 12-gene model. Neoplasia. 2006;8:59–68. doi: 10.1593/neo.05664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Camp RL, Chung GG, Rimm DL. Automated subcellular localization and quantification of protein expression in tissue microarrays. Nature Medicine. 2002;8:1323–28. doi: 10.1038/nm791. [DOI] [PubMed] [Google Scholar]
- 29.Horvath S, Liu X, Huang Y, Kim HL, Seligson DB. Forest-based methods for analyzing tissue microarray. ASA Proceedings of the Joint Statistical Meetings, 1863-1869 American Statistical Association (Alexandria, VA) 2003 [Google Scholar]
- 30.Liu X, Minin V, Huang Y, Seligson DB, Horvath S. Statistical methods for analyzing tissue microarray data. Journal of Biopharmaceutical Statistics. 2004;14:671–85. doi: 10.1081/BIP-200025657. [DOI] [PubMed] [Google Scholar]