Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Aug 15.
Published in final edited form as: Stat Med. 2015 Apr 6;34(18):2662–2675. doi: 10.1002/sim.6500

A prediction model for colon cancer surveillance data

Norm M Good a, Krithika Suresh b, Graeme P Young c, Trevor J Lockett d, Finlay A Macrae e, Jeremy MG Taylor b,*,
PMCID: PMC4494883  NIHMSID: NIHMS674634  PMID: 25851283

Abstract

Dynamic prediction models make use of patient-specific longitudinal data to update individualized survival probability predictions based on current and past information. Colonoscopy (COL) and fecal occult blood test (FOBT) results were collected from two Australian surveillance studies on individuals characterized as high-risk based on a personal or family history of colorectal cancer. Motivated by a Poisson process, this paper proposes a generalized non-linear model with a complementary log-log link as a dynamic prediction tool that produces individualized probabilities for the risk of developing advanced adenoma or colorectal cancer (AAC). This model allows predicted risk to depend on a patient’s baseline characteristics and time-dependent covariates. Information on the dates and results of COLs and FOBTs were incorporated using time-dependent covariates that contributed to patient risk of AAC for a specified period following the test result. These covariates serve to update a person’s risk as additional COL and FOBT test information becomes available. Model selection was conducted systematically through the comparison of Akaike information criterion (AIC). Goodness-of-fit was assessed with the use of calibration plots to compare the predicted probability of event occurrence with the proportion of events observed. Abnormal COL results were found to significantly increase risk of AAC for one year following the test. Positive FOBTs were found to significantly increase risk of AAC for three months following the result. The covariates that incorporated the updated test results were of greater significance and had a larger effect on risk than the baseline variables.

Keywords: colonoscopy, cancer surveillance, interval censored, adenoma, complementary log-log link, Poisson process


There is a well-established model for colorectal cancer (CRC) development [1]. CRC begins as benign polyps that grow over time in the lining of the colon. Individuals are more susceptible to developing polyps if they are aged 50 or older, have a personal or family history of polyps, and/or have inherited gene mutations known to be associated with colon polyps. Most of these polyps, characterized as hyperplastic polyps, will remain benign for the lifetime of the individual, with a very low chance of becoming cancer. Some polyps, known as adenomas, may become cancerous if not removed. The risk of an adenoma developing into cancer increases with adenoma size and the period it has been growing in the colon. A person is said to have advanced adenoma if an adenoma is large (≥ 10mm), if an adenoma has certain characteristics (tubulovillous, villous histology, or high-grade dysplasia), or if the person has three or more adenomas. Advanced adenoma is a surrogate for present and future CRC risk [2]. Malignant adenomas that are not removed in the early stages can develop into invasive CRC. As the cancer grows, it may infiltrate neighboring structures, which if undetected and untreated would lead to the individual experiencing symptoms. CRC is one of the most common cancers and treatments are only moderately effective, particularly for more advanced disease. CRC survival rates are improved with early diagnosis, thus there is great interest in methods for early detection and prevention. There are a number of tests for detecting early signs of CRC, such as a colonoscopy (COL), a fecal occult blood test (FOBT), a sigmoidoscopy, many variations of these tests, and new biomarker-based tests. These tests have properties, risks, and expenses associated with them, so an important medical and public health problem is the judicious use of these tests.

A colonoscopy is a procedure by which adenomas and cancers are visually looked for by inserting a tube into the colon. As part of the procedure, any adenomas that can be are resected, removed, and subsequently examined to see if they contain cancer. Thus, a colonoscopy is both a detection and a prevention method because it removes adenomas that are or may become malignant. A FOBT is a test of whether blood is detected in the stool. It is not specific to adenomas or cancers, but for mechanical reasons larger adenomas and cancers are more likely to be bleeding, and hence more likely to give rise to a positive FOBT. In practice, a positive FOBT is usually quickly followed up by a COL. When these tests are applied in the general low-risk population, it is called screening. When the tests are applied in a population who are considered high risk for CRC, it is called surveillance. There are a number of published guidelines for the use of these tests in the screening and surveillance setting [38].

In this paper, we will be developing a dynamic individual prediction model from a data set of high-risk individuals who are in a surveillance program. The increasing availability of longitudinal patient information enables individualized dynamic predictions [9, 10]. For a given individual, dynamic prediction models predict the probability of an event occurring in an interval by conditioning on the individual’s longitudinal patient history of tests. In addition to varying between individuals, the risk of developing CRC changes over the trajectory of an individual’s life. The benefit of these models is the ability to update patient risk at each colonoscopy and FOBT while still incorporating the effect of previous test results.

This paper presents an approach to analyzing data from studies of serial colonoscopies in individuals who are at high risk for colon cancer. The primary goal is to provide a method that gives individual predictions based on a person’s characteristics and the results from their sequence of FOBT and COL tests. Specifically, the aim is to estimate the distribution of times to development of disease for a person at the time of a colonoscopy or at any specified age. Since cancers are relatively rare, we characterize the disease, or event of interest, as the development of an advanced adenoma or cancer. The prediction, which will be in the form of a distribution function, may be a useful ingredient for individual recommendations or for an overall surveillance policy. For example, someone who is estimated to have a very low risk of developing a new advanced adenoma or cancer in the next 5 years would likely not be scheduled for a COL in the immediate future, compared to someone with a high risk who should be scheduled for a COL much sooner. The prediction model we present will also use information on simpler covariates, such as gender and family history of CRC, as they are also likely to influence the estimates of someone’s risk.

While the data are rich in information, there are complicating aspects associated with it that require careful and potentially non-standard statistical approaches. These data were collected over a number of years, and the guidelines for when COLs and FOBTs should be scheduled have changed during that period. The number of colonoscopies per patient is quite variable and the time interval between colonoscopies is heterogeneous both within and between people. The development of advanced adenoma or cancer is something that happens in continuous time, but observations about that process are only made at the time of COLs, leading to interval-censored data. A person can have more than one observed occurrence of advanced adenoma or cancer during their follow-up. The COL and FOBT results can be regarded as time-dependent covariates that provide information about an individual’s risk. There are more subtle issues too, such as the frequency of colonoscopies, which itself is a measure of risk and is impacted by prior results. We will not attempt to address all the issues in this paper, but rather limit ourselves to providing a statistical framework that can be used for prediction purposes.

The statistical approach we develop regards the time of development of the event as arising in continuous time from a Poisson process, as has been used by others in colon cancer research [11]. Treating the event times as interval-censored, we formulate a likelihood function and proceed with maximum likelihood estimation. As a consequence of the Poisson process assumption, we show that our binary outcome data can be modeled using a generalized non-linear model with a complementary log-log link, which facilitates estimation using standard software.

The paper is organized as follows. We describe the data in section 1, and the model and the estimation method in section 2. The results including parameter estimates and goodness-of-fit are presented in section 3. An example of individualized predictions of risk is demonstrated in section 4. We finish with a discussion of model limitations and alternative strategies for future model adaptations.

1. Colorectal cancer surveillance data

1.1. Data

The data comes from two surveillance programs in Adelaide (7670 participants) [1214] and Melbourne (2829 participants) [15, 16], and are also described in other literature [17]. The data were collected over the period January 1, 1976 to August 3, 2010. The total number of colonoscopies recorded during this period is 20,056. Patients with only one recorded colonoscopy, of which there are 4550 (3818 from Adelaide and 732 from Melbourne), are not included in the data analysis due to the lack of follow-up. The two surveillance programs have different recruitment strategies. The Melbourne study recruits high-risk individuals based on a family history of CRC, and thus has a younger average age at first COL compared to Adelaide, which recruits individuals with both a family and a personal history of CRC. The Melbourne study also has a larger average number of COLs per person. The eligibility criteria for the studies were that participants had to be at high risk for CRC based on a personal history of CRC or adenomas, or a family history of CRC. For the purpose of this paper, individuals younger than 40 years old at their first COL were excluded and only 20 years of follow-up data were retained for each patient. Summary statistics for the cohort used in our analysis are given in Table 1.

Table 1.

Descriptive statistics for the study cohort used in analysis

Variable Adelaide Melbourne Combined
Patients 3,499 1,059 4,558
Average Age at index (SD) 63.7 (11.4) 54.0 (8.8) 61.4 (11.6)
Female (%) 1,612 (46.1) 683 (64.5) 2,295 (50.4)
Family History (%)
R0 2,430 (69.4) 46 (4.3) 2,476 (54.3)
R1 150 (4.3) 108 (10.2) 258 (5.7)
R2 224 (6.4) 173 (16.3) 397 (8.7)
R3 234 (6.7) 107 (10.1) 341 (7.5)
R4 407 (11.6) 273 (25.8) 680 (14.9)
CCC 0 (0) 83 (7.8) 83 (1.8)
HNPCC suspected 23 (0.7) 126 (11.9) 149 (3.3)
HNPCC definite 31 (0.9) 143 (13.5) 174 (3.8)
Colonoscopies 9,704 3,723 13,427
Colonoscopies per patient (%)
2 1,961 (56.0) 345 (32.6) 2,306 (50.6)
3 881 (25.2) 268 (25.3) 1,149 (25.2)
4 365 (10.4) 177 (16.7) 542 (11.9)
5+ 292 (8.3) 269 (25.4) 561 (12.3)
Reason for Colonoscopy (%)
Surveillance (<9 months since previous COL) 2,101 (21.7) 525 (14.1) 2,626 (19.6)
Surveillance (≥9 months since previous COL) 4,120 (42.5) 2,560 (68.8) 6,680 (49.8)
Symptoms 2,005 (20.7) 50 (1.3) 2,055 (15.3)
Positive FOBT 1,255 (12.9) 381 (10.2) 1,636 (12.2)
Abnormal CT/Other 223 (2.3) 207 (5.6) 430 (3.2)
Colonoscopy Result (%)
Normal 4,160 (42.9) 2,011 (54.0) 6,171 (46.0)
Hyperplastic 766 (7.9) 369 (9.9) 1,135 (8.5)
Adenoma 1,795 (18.5) 267 (7.2) 2,062 (15.4)
Advanced adenoma 1,497 (15.4) 275 (7.4) 1,772 (13.2)
CRC 453 (4.7) 24 (0.6) 477 (3.6)
Other 1,033 (10.6) 777 (20.9) 1,810 (13.5)
FOBTs 10,944 8,999 19,943
FOBT result (%)
Positive 1,036 (9.5) 762 (8.5) 1,798 (9.0)

R0 - No family history but a personal history of adenoma or cancer

R1 - CRC both parents; CRC two siblings; CRC parent and sibling; CRC twins

R2 - CRC parent and grandparent; One first degree relative plus other family members with CRC; CRC parent and parent’s sibling

R3 - CRC first degree relative under 55 years of age

R4 - CRC first degree relative over 55 years of age; CRC second or third degree relative

CCC - Cluster of common colorectal cancers, i.e., three or more first degree relatives with CRC all over 50 years at diagnosis

HNPCC suspected - suspected HNPCC, i.e., two Amsterdam criteria

HNPCC definite - definite HNPCC Individuals in families meeting Amsterdam criteria, or carrying a mismatch repair mutation

The data consist of serial results of COLs and FOBTs, the dates at which these were measured, the reason for the test, and other information about the individual (age, gender, family history of colorectal cancer, date of death, etc.). The result of each COL is a categorical variable taking one of six values: Normal, Other, Hyperplastic, Adenoma, Advanced adenoma, or CRC. For many patients the scheduling of colonoscopies was dynamic, with the planned time of the next colonoscopy generally dependent on the result of the current colonoscopy. For someone with a clear (normal) colonoscopy the typical interval to the next colonoscopy was 5 years; if a small lesion was found the typical interval was 3 years; if a serious lesion was found the typical interval was 1 year. However, there are substantial departures from this pattern. The average time between successive COLs in the studied group is 3.0 years (SD = 1.8 years). The result of the FOBT is binary, either positive or negative. The timing of the FOBT was more variable. In some patients it was every year, whereas in others it was seldom. If a patient had a positive finding on a FOBT, it was nearly always followed by a COL within 3 months. For 101 colonoscopies (87 from Adelaide, 14 from Melbourne) that were characterized as being conducted due to a positive FOBT result, there was no prior positive FOBT result in the data set. For these cases, a prior positive FOBT result was assumed to have occurred and was coded into the data as occurring 1 month prior to the corresponding colonoscopy.

2. Statistical Model and Estimation

2.1. Notation

Let τij be the age of patient i (i = 1, … , n) at their jth colonoscopy (COL) (j = 1, … , mi) where patient i has mi ≥ 2 COLs at ages τi1, … , τimi. Let τi0 be the youngest possible age for a COL, which we take as 40 years old.

The primary outcome variable of interest is the occurrence of advanced adenoma or cancer, denoted by AAC. The results of the COL for this primary endpoint are denoted by Ti1, … , Yimi, where Yij = 1 indicates AAC found at age τij and Yij = 0 indicates a finding of no AAC at age τij.

Let Xi denote the baseline covariates associated with each individual (e.g., gender, family history, city) and Xij denote time-varying covariates associated with COL at age τij (e.g., a binary indicator that the reason for the COL is symptoms).

Let Zi(a) denote a vector of time-dependent covariates. Each component of Zi(a) is constructed to represent some aspect of the knowledge about individual i at age a, e.g., summary of FOBT history or summary of past COL results. Each Zi(a) will be defined at all ages throughout the individuals’ follow-up. In the formulation described below we will assume that every Zi is binary, but the methodology would allow them to be continuous time-dependent variables, e.g., the total number of adenomas found on all previous colonoscopies. There is a lot of flexibility in how the Z’s are constructed to match various hypotheses that may be of interest. Specific choices for the Z’s will be discussed later. See Figure 1 for an illustration of the described notation using the COL and FOBT history of a sample patient.

Figure 1.

Figure 1

Illustration of the described notation using the COL and FOBT history of sample patient with mi = 2 COLs (at j = 1, 2) and 4 FOBTs (at ↑s). COL results are indicated by A (adenoma) and AA (advanced adenoma). FOBT results are indicated by 0 (Negative) and 1 (Positive). The Xij corresponds to the time-varying COL-specific covariate “binary indicator that the reason for the COL is symptoms”. The Yij corresponds to the binary outcome of AAC. Two possible Zi’s are indicated here: (1) A binary indicator of whether there was a positive FOBT in the past 3 months, and (2) A binary indicator of whether the patient ever had an abnormal COL. For (1) we see the corresponding Z jump to 1 at age 54.7, corresponding to a positive FOBT, and stay there for 3 months, after which it returns to 0. For (2) we see Z jump to 1 at age 55.0, corresponding to an abnormal COL, and stay that for the rest of the individual’s follow-up.

2.2. Statistical model

Assume that the hazard of an AAC event at age a (in years) for patient i is given by λi(a). The specific time at which the AAC event occurs is the first age at which an advanced adenoma or cancer would be found if a colonoscopy were to be performed at that age. The occurrence of one AAC does not prevent the occurrence of later AAC events. We will assume that the events, if observable, follow a Poisson process for each person, with intensity denoted by λi(a). This process is for when a new advanced adenoma or cancer would appear if the monitoring were continuous. In this formulation, the actual event, if observable, could occur many times. So for each person, λi(a) changes with age, and the value at age a represents the chance of getting an AAC event in the next unit of time. From any age going forward, λi(a) can be modified by new information you learn about the person. For example, if a new test were done at age τ and the results suggest this person is at high risk, we would expect λi(a) to increase at a = τ.

Monitoring for AAC is not done continuously, and data is collected about AAC only at the time of each COL. Based on the model, the number of occurrences of AAC in (τij–1, τij) has a Poisson distribution with mean (τij1τijλi(a)da), and hence the probability of at least one AAC in the interval (denoted by Yij = 1) is

P(Yij=1)=P{event in interval(τij1,τij)}=1exp{τij1τijλi(a)da} (1)

The rationale for integrating the hazard over the interval (τij–1, τij) is to accumulate the risk of AAC during intervals between COLs and model the probability of an event being higher following longer intervals. This reflects a unique data feature that a longer period between COLs represents a longer time during which advanced adenoma or cancer can develop, thus increasing the risk of AAC at the end of a longer interval between COLs.

A convenient form is to assume λi(a) has a proportional hazards structure given by

λi(a)=λ0(a)exp{βXi+θZi(a)} (2)

We expect λ0(a) to be some smooth increasing function of age, and we will assume

λ0(a)=exp(α0)(a35)α1 (3)

for ages greater than 40. We would expect α1 > 0 since adenoma rate tends to be higher in older people. Other parametric forms for λ0(a) are possible. For identifiability purposes, βX does not include an intercept term since it is captured by α0.

Combining equations 1 and 2, and letting pij = P(Yij = 1), it can be seen that

log{log(1pij)}=βXi+log[τij1τijλ0(a)exp{θZi(a)}da]

which is a generalized non-linear model for binary responses with a complementary log-log link.

The model is further extended to accommodate time-dependent covariates, Xij, that can vary at different COLs.

log{log(1pij)}=β1Xi+β2Xij+log[τij1τijλ0(a)exp{θZi(a)}da] (4)

This model allows the probability of the COL test being positive to be modified by COL specific covariates, such as the reason for the test being patient symptoms.

2.3. Maximum likelihood estimation

Each patient has mi – 1 intervals, each of which contributes to the likelihood function. Assuming independence between intervals, the likelihood contribution for patient i is

Li(β,θ,α)=j=2mi{P(Yij=1)}Yij{1P(Yij=1)}(1Yij) (5)

where the expression for P (Yij = 1) comes from equation 4, and β, θ and α are parameters, where α are the parameters in the equation for λ0(a). Note that j = 1 is not included in the product for the likelihood, thus the results of each person’s first COL are not used as an outcome variable.

The observed data likelihood is then

Li(β,θ,α)=i=1nLi(β,θ,α) (6)

The form of the likelihood in equation 5 and 6 is identical to that of a binary response model with i=1n(mi1) observations and assuming independent Bernoulli distributions for each observation. Thus, it is possible to use software for generalized non-linear models to obtain the parameter estimates.

Note that the parameters β always enter the model in a linear way; however, the parameters θ and those in λ0(a) enter into the model in a non-linear way. For variables Z that do not change between intervals, there is some simplification. Specifically, their coefficients can be extracted from the integral and will be included as linear terms θZ. However, if the Z is a variable that can change between COLs, the parameters associated with that Z will remain in the non-linear part of the model.

In this approach, the unit of analysis is the interval between COLs, rather than the person. Thus, for the purposes of obtaining point estimates, each interval is regarded as conditionally independent. However, because of the potential correlation between intervals for the same person, the usual information matrix is not expected to provide valid standard errors. Thus, we will use the bootstrap to obtain standard errors, where patients are resampled with replacement.

2.4. Discrete parametrization

Due to the large number of time-dependent covariates, to use available software to implement the integrations in equation 4 we discretized time into one-month units. The integrals in the given model and likelihood functions are thus approximated by sums. The baseline hazard becomes a step function that has a jump discontinuity at each month. The ages at which individuals had test results were rounded to the nearest twelfth of a year. The covariate Zi(a) then represents the value of covariate Z at age a for individual i, where a is a multiple of 1/12. Every Z is represented by a matrix, where each column corresponds to a month and each row corresponds to an interval between colonoscopies. The discretized integrals for each colonoscopy interval are computed by parsing through the columns of the Z matrices. All statistical analyses and model building were conducted using R 3.0.2 software. Specifically, it is possible to use the ‘gnm’ package in R to fit the described generalized non-linear model. However, depending on the complexity of the functions λ0(a) and Zi(a), the integral on the right hand side of equation 4 may not have an analytic closed form expression. Thus, the development of the model covariates for input into the ‘gnm’ function was challenging. An outline of the R code used is presented in the Supplemental Materials.

2.5. Incorporation of someone’s historical and prior information into the model

For construction of the likelihood and estimation of the parameters in the model, information about the person that does not change over time, such as gender and family history, would be included as one of the fixed covariates Xi.

To capture the differential risk associated with a “for reason” colonoscopy compared to a colonoscopy scheduled for surveillance, a colonoscopy-varying indicator variable indicating whether the reason for the colonoscopy was “symptoms” was included in the model. Individuals with a colonoscopy conducted due to symptoms are expected to have an increased probability of having an AAC result. This covariate contributes to the model as a fixed covariate Xij, but varies for each colonoscopy interval rather than by individual.

The information about a person’s history of COL and FOBTs and their results could be captured by Zi(a). We will call this their historical record, which includes everything that is known about a person’s COL/FOBT testing both during and before the study. A sensible model will probably require a vector of time dependent covariates, i.e. Z1i(a), Z2i(a), … , Zpi(a) to allow for different aspects of their history to be important. There is a lot of flexibility in this approach and there are many different choices that can be made for the Z’s.

In the analysis of the data, we were particularly interested in incorporating prior COL and FOBT results and their impact on risk, and allowing for differences in the impact on risk if the prior COL/FOBT result was in the recent past compared to in the distant past. The specific choices for Zi(a) we considered were of the form “has the patient had a test with result M in the last P years”. The set of M’s we considered were hyperplastic/other on a COL, adenoma on a COL, advanced adenoma on a COL, cancer on a COL, and a positive FOBT. The time frames for P for the COL results we considered were the last 1 year, the last 1.5 years, the last 5 years and ever during the study period. The time frames considered for P for a positive FOBT were the last 3 months, the last 6 months, the last 9 months, and the last 1 year. With these definitions each Z is a time-dependent binary variable. For example, the Z corresponding to having an advanced adenoma ever during the study period would be coded as zero up to the age of the first finding of an advanced adenoma on a COL, and then as one at all later ages. The Z corresponding to having a positive FOBT within the last 3 months is illustrated in Figure 1 and would be coded as zero up to the time of the first positive FOBT, would jump to one at that age and remain at one for the next 3 months, then would change to zero at the next age for which there was not a positive FOBT within the last 3 months. It could then subsequently jump up to one again if the patient had a later positive FOBT.

Another intricacy to be considered in this particular situation is the overlapping of the effect of the Z covariates corresponding to abnormal COL results. Due to the binary nature of the identified Z covariates and the corresponding positive coefficients, an individual with multiple colonoscopies with various abnormal results in a short period (e.g., adenoma at the first colonoscopy, followed by an advanced adenoma within 6 months) would have an inflated predicted risk compared to someone with multiple colonoscopies but with the same, and possibly more severe, abnormal results at all colonoscopies (e.g., advanced adenoma at the first colonoscopy, followed by an advanced adenoma within 6 months). In such a situation we would expect that consecutive advanced adenoma results would increase an individual’s risk of an AAC finding compared to an adenoma followed by an advanced adenoma. Thus, the effect of the identified Z matrix covariates was modified to last for the minimum of their chosen P time frame and the time until the individual’s next colonoscopy result.

The Zi(a)’s described above capture many aspects of a person’s current and past results of FOBT and COL tests. The dynamic nature of a person’s sequence of tests is very complex, and it would certainly be possible to capture more aspects of the timing and results by defining additional Z’s. For example, if the relevance of a test result differed with age, that could be captured by an interaction of the test result with a discrete version of age. The set of possible Z’s one could consider is large.

Model selection was conducted based on minimization of model Akaike information criterion (AIC), which is a goodness-of-fit measure that is defined as AIC=2×(number of parameters in the model)–2×(maximized log likelihood). All considered models included the fixed effects of location, gender, risk group, and the colonoscopy-varying covariate of test reason being symptoms. The Z’s considered to be of interest were those corresponding to having a colonoscopy result of hyperplastic/other, adenoma, advanced adenoma, and cancer within the past 1, 1.5, 5 year(s), or ever while under surveillance. A Z corresponding to ever having any abnormal colonoscopy result, defined here as adenoma, advanced adenoma, or cancer, during the study period was also included. To account for the effect of FOBTs, we considered Z’s corresponding to a positive FOBT within the last 3 months, 6 months, 9 months, or 1 year. Including multiple Z’s corresponding to the same abnormal colonoscopy result (e.g., adenoma within the past 1 year and adenoma within the past 1.5 years) in the same model produced a negative coefficient for the corresponding covariate due to multiple covariates trying to describe the same effect. Thus, the selection of Z’s was conducted by varying the time frame P for a single abnormal result in each model and then selecting the best combination of time frames for the abnormal COL results based on the model with the lowest AIC.

2.6. Interpretation of parameters

The β’s are risk ratios associated with a unit change in the baseline covariates Xi or the reason for test covariate Xij. The θ’s are the risk ratios associated with unit changes in Z. Thus, for example, if one of the Z’s was set up to represent any prior abnormal FOBT, then a large positive θ would suggest that people who had prior positive FOBTs anytime in the past are at an increased risk of AAC. Similarly, if another Z was set up to jump from zero to one if a person had an adenoma COL result, then an estimate of θ near zero for this Z would suggest that an adenoma result on a COL does not change the assessment of the subsequent risk of AAC for that person. If we had set up Z to represent at least two prior normal COLs, this might indicate the person was at lower than normal risk, in which case you would expect the estimate of θ to be negative.

The above discussion illustrates the subtleties in interpretation and possible over-interpretation of the θ’s. A unique feature of a COL is that it is also an intervention. The procedure attempts to remove any adenomas that are found, thus decreasing the risk of an adenoma developing into AAC. The described model specifies the risk of an event as a function of Z(t), where Z(t) represents the results of past testing. If Z(t) captures colonoscopy testing and results, a person’s actual risk may change as a result of having a COL. However, even though the COL procedure may reduce that person’s risk due to removal of an adenoma, we might still expect to see a positive estimate of θ since this person is now considered at higher long-term risk because of what has been learned about them compared to what was known about them before. In contrast, an FOBT is purely for detection purposes and a positive FOBT indicates that the person is in a subset of the population with possible increased risk of AAC, but because it is just a test it does not change that person’s specific probability of advanced adenoma or cancer developing. Thus, if Z(t) represents FOBTs and their results, the person’s hazard of the event does not change as a result of the test; yet, we would expect the corresponding θ to be positive since this person is now considered at higher risk because of what has been learned about them compared to what was known about them before. In both these cases the result of the test has provided information about the person’s risk, and this information is being represented by θ and is useful for making accurate predictions for the individual. Since our main purpose is prediction, the above difficulties in interpreting the θ’s is not an over-riding limitation.

The parameters α0 and α1 define how the intensity of experiencing an AAC changes with age. The specific shape for λ0(a) should not be over-interpreted since it is simply a convenient functional form. Algebraically, it is the intensity for people who have either had no tests done or had tests done and all the test results are normal or negative. But, since the program enrolled high-risk people who had previous abnormal findings or were tested during the study there are very few such people. The expected increasing nature of λ0(a) is simply a way to capture the idea that if the interval between COLs is long and the person has an AAC finding at the COL, it is more likely to have occurred towards the latter part of the interval since the occurrence of cancer and adenomas is more common in older than in younger people.

2.7. Standard errors

Standard error estimates were obtained using bootstrap methods. Three hundred bootstrap samples were generated by randomly selecting n (the total number of patients) individuals with replacement from the original data set. When an individual is selected for a bootstrap sample, all of the individual’s COL and FOBT data are selected. Bootstrap standard errors account for model overdispersion and for some of the correlation between the observations on the same individual.

3. Results

3.1. Parameter estimates

The results are given in Table 2. The negative parameter estimate for females indicates that risk of AAC is decreased for females compared to males. There is no significant difference in the risk of AAC between the Melbourne and Adelaide groups, which is a notable finding given that these two studies had very different recruitment strategies. The coefficient for α1 is greater than 0, indicating that risk significantly increases with age. Using the estimate for α1, we compute that a 70 year old person has 2.5 times the hazard of AAC compared to a 50 year old if all other risk factors were the same. In comparison to those with a family history of R0, those in the HNPCC definite group have a significant increased risk of AAC. The R3, CCC, and HNPCC suspected groups are also of increased risk, but the risk is not significantly different compared to those in R0. The R1 group has the lowest risk, but it is not statistically significantly different from the R0 group. If the reason for the COL is because the patient had symptoms, there is a significant increased risk of AAC. All of the abnormal colonoscopy results provided the best fit when the effect was considered 1 year past the observed result. A prior colonoscopy result of advanced adenoma produced the greatest increase in risk, followed by cancer, adenoma, and hyperplastic/other. The Hyperplastic and Other results were grouped together since they exhibited a similar pattern of AAC event rates when examining the event rate by previous COL result and time until the next COL. The estimates for all the prior colonoscopy results all positive. Thus, the long-term risk associated with an abnormal COL result is greater than the benefit from having adenomas removed as a result of the COL procedure. A positive FOBT significantly affected risk when the effect was considered for 3 months past the positive result, and the positive coefficient indicates a significant increased risk for 3 months subsequent to a positive FOBT result.

Table 2.

Parameter Estimates

Variable Estimate Model
SE
Bootstrap
SE
p-value Hazard
Ratio
Age
Intercept (α0)
−7.93 0.38 0.44
α 1 1.10 0.11 0.13 < 0.001 ***

β
Female −0.13 0.07 0.07 0.08 0.88
Melbourne −0.10 0.11 0.13 0.45 0.90
Symptoms 0.21 0.10 0.10 0.04 * 1.23
Family History
 R1 −0.26 0.18 0.21 0.23 0.77
 R2 −0.10 0.17 0.18 0.55 0.90
 R3 0.27 0.15 0.17 0.10 1.31
 R4 −0.09 0.12 0.13 0.49 0.91
 CCC 0.09 0.27 0.27 0.73 1.10
 HNPCC suspected 0.42 0.19 0.26 0.11 1.52
 HNPCC definite 0.53 0.18 0.20 < 0.01 ** 1.70

θ
Adenoma < 1 yr 1.59 0.16 0.15 < 0.001 *** 4.90
Advanced adenoma < 1 yr 2.46 0.12 0.13 < 0.001 *** 11.8
Cancer < 1 yr 1.73 0.18 0.18 < 0.001 *** 5.63
Hyperplastic/Other < 1 yr 1.42 0.14 0.15 < 0.001 *** 4.15
Abnormal COL ever during the study period 0.35 0.11 0.11 < 0.01 ** 1.41
Positive FOBT < 3 months 1.85 0.19 0.21 < 0.001 *** 6.37

The p-values are determined using the Bootstrap SEs in a Wald test for significance;

*

0.01 < p ≤ 0.05;

**

0.001 < p ≤ 0.01;

***

p ≤ 0.001

Family history covariates are relative to R0

Of note from the results is that the risk ratios and significance levels of the updated COL and FOBT results are much stronger than for the baseline variables. Thus, more recent test results are a much more important determinant of a person’s risk than family history and gender.

The standard deviations of the bootstrap estimates were similar, although sometimes slightly larger, than those from the fitted model that uses data from all patients. This suggests that the impact of overdispersion and within-subject correlation are not large in these data.

3.2. Goodness-of-Fit

Calibration plots were used to compare the observed proportion of events against the probability of AAC predicted by the model. The estimated values for the probability that the COL was AAC (i.e., the estimates of pij from equation 4) were categorized into bins of observations and the proportion of events observed within these groups was compared to the median of the bin’s predicted probabilities. Model fit was evaluated by comparing the agreement between the two. Calibration plots were created for subsets of the data (e.g., by interval length, patient’s age at observation, and surveillance status) to examine whether inconsistencies in agreement between predicted and observed proportions are associated with a particular covariate. Figure 2 shows the overall calibration plot and depicts a generally good agreement between the predicted probabilities and the observed proportions. The final dot represents fewer COL intervals with a wider range of prediction probabilities, and thus the median predicted probability of AAC appears to be lower than the observed proportion of AAC events for those intervals. In addition, the proposed model tends to slightly underestimate the risk when the estimated risk of AAC is low. This phenomenon is more apparent in the examination of calibration plots by interval length, shown in Figure 3. For the intervals between COLs of less than one year, the observed proportion of AAC events is noticeably higher than the predicted probability of occurrence. The selected model does not fully capture the clinical relevance of short time periods between COLs. There are many reasons why a COL may be performed within one year of a previous COL. Some of these are included in the model, but there are others that may not be. For example, clinical judgment or factors not in the data set may suggest the patient is at higher risk and hence should return soon for another COL. This would likely lead to a higher probability of a positive COL finding than is predicted by the model. Another reason is that the first COL may have been incomplete, or have given ambiguous or suspicious results, which would then require a “follow-up” colonoscopy. This would also lead to a higher probability of a positive COL finding than is predicted by the model. Although, in constructing the data set considerable effort was given to collapse such “follow-up” COLs into a single COL, so this may be a minor reason for the higher observed rate.

Figure 2.

Figure 2

Overall calibration plot to compare the model’s predicted probability with the proportion of observed events. Each dot represents 500 COL intervals, except for the last bin, which represents 369 intervals. Dots are centered at the median predicted probability for the corresponding bin.

Figure 3.

Figure 3

Calibration plot by interval length to compare the model’s predicted probability with the proportion of observed events. (Left) Each dot represents 200 COL intervals, except for the last bin, which represents 130 intervals. (Right) Each dot represents 500 COL intervals, except for the last bin, which represents 51 intervals. Dots are centered at the median predicted probability for the corresponding bin.

4. Individualized predicted survival distribution of time to a future AAC event

The model can be used to give the individualized survival distribution for time Ti, when the next AAC event occurs for patient i, given all that is known about that patient up to age A. Specifically, assume the prediction is to be made for person i at age A. Then the probability of no new AAC within the next t years is

S(t)=P{Ti>A+tage=A,Xi,Zi(A)} (7)
=exp[AA+tλi(a)da] (8)

which is estimated by

=exp[exp(β^Xi)AA+tλ^0(a)exp{θ^Zi(a)}da] (9)

This calculation requires specification of Zi(a) from the current age A to age A + t. Thus, we are making predictions under the assumption that the person will not have any COLs or other tests between age A and age A + t. Also, note that because the integral is from A to A + t the actual values of Zi(a) for ages before A do not enter into this calculation; however, someone’s record of COLs and FOBTs prior to age A may get captured depending on how Zi(a) is defined. Hazard and survival curves can be generated for individuals following their last follow-up using model estimates. Individuals are considered to be at increased risk if they have a positive colonoscopy result within the last 1 year or ever during the study period, depending on the covariate corresponding to that positive result in the model. The reason for the subsequent colonoscopies is assumed to be not due to symptoms. To represent the risk graphically, we plot one minus the survival distribution versus time from the current age. The risk (p) of an AAC can be read from the graph at any desired follow-up time (t). For a person this risk can be interpreted in the following way: If there are 1000 people like you, with exactly the same age, family history, and series of test results, if the next colonoscopy for each of these people is t years in the future, without any intermediate tests, then 1000 × p of them are expected to have a positive finding of AAC on that colonoscopy.

As an illustration, Table 3 describes the colonoscopy and FOBT follow-up data for four hypothetical patients of varying risk. Patients 1 and 2 are male Melbourne patients with R1 family risk and based on their test results can be categorized as being low-risk. Patients 3 and 4 are male Melbourne patients with the more risky HNPCC definite family history. Patient 1 has a different test sequence and all four patients have differing test results. In Figure 4 we show the individualized predicted event probability curves for these four patients, assuming no symptoms and no additional FOBTs for the prediction period. Reading from the graphs the three-year-risk for the patients 1, 2, 3 and 4 are 0.02, 0.06, 0.20 and 0.44, respectively. These graphs were produced using the parameter estimates in Table 2 and equation 9. The rate of the continuous increase of the predicted event probability is determined by the effect of age and the fixed effects of gender, location, and family risk. Due to the final selected model, a positive FOBT only has an effect for the subsequent 3 months, and a particular abnormal COL only has an effect for the following 1 year. In addition, Patients 2, 3 and 4 have increased risk due to the positive effect of the “abnormal COL ever during the study period” coefficient. Due to the dynamic nature of the prediction model, Patient 2 and Patient 3 have different predicted risk trajectories based on the difference in their history of test results. This would not have been reflected by standard prediction tools that only incorporate the effect of the most recently observed event.

Table 3.

Example of patient historical record for low-risk male Melbourne patients with R1 family risk (Patients 1 and 2) and high-risk male Melbourne patients with HNPCC definite family risk (Patient 3 and Patient 4)

Age Test Test Result
Patient 1 Patient 2 Patient 3 Patient 4
52.65 FOBT Negative Negative Negative Negative
53.55 FOBT Negative Negative Negative Negative
54.72 FOBT Positive Positive Positive Positive
55.04 COL Normal Advanced adenoma Adenoma Adenoma
55.29 COL - Adenoma Advanced adenoma Advanced adenoma
55.58 FOBT - Negative Negative Positive

Figure 4.

Figure 4

Predicted failure probability (1S^(t)) for the next 3 years past the current age of 55.58 for four individuals with varying risk, for whom historical records of COLs and FOBTs are given in Table 3.

In the long run, one could imagine a website where a person specifies a horizon of time t and inputs their values for their current age A, their baseline variables Xi, and their history of COL and FOBT results. The website will then calculate and output 1S^(t), the chance of an AAC within the next t years. This could be a useful aid in helping to decide when to schedule the next COL or FOBT.

5. Discussion

The proposed model produces prediction of risk in a dynamic manner. As new information becomes available about a person’s COL and FOBT results, the model incorporates its effects and updates the predicted risk of developing the outcome of interest. This type of model is particularly useful for the surveillance data studied in this paper because it incorporates the longitudinal COL and FOBT results from the frequent testing of this high-risk group. A variety of covariates could have been chosen to capture the effect of updated test information. Abnormal COL results were found to have a significant effect on increased AAC risk for one year past the abnormal finding. A positive FOBT was found to significantly increase risk for 3 months following the test. The covariates that incorporated the updated information aspect of the model were of greater significance than the baseline variables.

The timing of the COLs themselves perhaps carries some information about the disease development. While many of the reasons a person comes in for a COL are included in the model, such as positive FOBT, age, symptoms, and regular surveillance, there may be other reasons, such as type of symptoms, other lab tests, change in family history, etc., which were not included. These might provide information about the level of risk, suggesting that if the process of the timing of the COLs had been modeled, additional information may have been obtained about a person’s risk. Developing such a model would require more information than is available in this data set.

The estimation method presented here does not utilize any of the information that may be known about a person after their last COL. Specifically, we may know the date of death for many people. Quite a number of people have one or more FOBT after their last COL, and all the FOBT results are negative. For people such as these, updated predictions like those shown in Figure 4, could be made after the latest FOBT test, as was illustrated by patients 2, 3 and 4 in the Figure.

The initial formulation of the model was based on a Poisson process, which lead in a natural way to a model for the observed binary response data. The binary response model, together with an assumption of independence gave a likelihood, from which point estimates were obtained. Measures of uncertainty were obtained using the bootstrap. Thus the validity of all the aspects the Poisson process assumptions is not being relied on. We regard it simply as a good starting point to develop a model.

Some sensitivity analyses were undertaken. The analysis was run separately for the Adelaide and Melbourne cohorts. We found that the parameter estimates were quite close to each other, and the location-specific 95% confidence intervals had considerable overlap for nearly all the parameters. This, together with the small estimated location effect in Table 2, provides justification for combining the data sets in one model. A second sensitivity analysis was to discretize the intervals into two-month, instead of one-month intervals. This gave very similar estimates to those shown in Table 2 and Figure 4.

For the calibration plots in Figures 2 and 3, the predictions are for the same patients who were used to derive the parameter estimates. As a more honest assessment, leave-one-out cross-validation was conducted. For each patient, the model was fit to the data set excluding that patient’s data, and the estimated coefficients were applied to the left-out patient to give predicted probabilities of AAC for that patient’s intervals. Calibration plots were created using the resulting predicted probabilities and were found to be very close to those produced by the fitted model. A different form of validation is external validation, in which predictions are applied to an external data set. We mimicked this by fitting the model to the Melbourne data and using the estimated coefficients to compute the predicted probabilities of AAC for the Adelaide data. Calibration plots for the Adelaide data were found to be similar to those in Figure 2 and 3. This provides evidence of generalizability of the fitted model to an independent data set.

There are many ways to consider adapting the presented model or to develop alternative approaches. For example, including a subject-specific frailty to better accommodate within and between-person heterogeneity would be a natural consideration. Such a model formulation would only include frailty terms in the hazard model and not include the Z’s, but would then require models in which the Z’s are outcome variables that depend on the frailty. Such a model could be challenging to formulate because it would require modeling over time the development of all 5 levels of the COL results and the FOBT results. In the model we have presented, the Zi terms can be regarded as a substitute for a frailty term, and one can think of the Zi’s as they are observed and measured as providing information about the frailty. This is the rationale for not including frailty in the model.

An alternative method for developing dynamic predictions is through landmarking [18]. In this approach, at each age a regression model of the residual time to the next AAC event is developed. The covariates in this model would need to be chosen to represent the current and past history of the person’s tests results and their baseline characteristics. Developing such covariates and implementing this method using available dynamic model software would be challenging. The regression coefficients for each covariate themselves depend on age, and these would need to be linked. Overall, this would be a viable, although complex, approach, and it would be interesting to compare the predictions from such an approach with the predictions from the stochastic process model developed in this paper. Since the outcome being modeled is binary, it is also possible to explore a simpler linear logistic model approach. The challenge with such an approach would be what covariates to use. The length of time since the last COL could be included to capture the increased risk of AAC for longer intervals between COLs that results from having more time for advanced adenoma or cancer to develop. However, we think this is less satisfactory than capturing the effect of the interval width through integrating the intensity function over the time interval, as in equations 1 and 4.

Aspects of the Zi’s can be thought of as “internal” covariates in the Cox model jargon, i.e., they themselves reflect disease progression. There is a reasonably large literature on joint modeling of longitudinally measured covariates together with event times [19, 20]. There are potential biases in estimates of hazard model parameters if one does not take into account the longitudinal data. This could require a complex model for development and progression of adenomas and how they relate to what one can actually measure. Because the Z’s are internal variables which can be regarded as outcome variables, interpretation of the θ’s may be tricky; however, for the purpose of future prediction this is less of a concern.

The individualized predictions presented in this paper are intended to be an aid to the patient and physician to help schedule the next colonoscopy or FOBT. Currently there are guidelines that a population can follow. The work presented here would facilitate a more personalized approach. For example, a COL might be scheduled at the time at which the risk is 10% or at 5 years, whichever is sooner, and a FOBT might be scheduled at the time at which the risk is 5%.

Supplementary Material

Supp Material

References

  • 1.Day DW, Morson BC. The adenoma-carcinoma sequence. Major Problems in Pathology. 1977;10:58–71. [PubMed] [Google Scholar]
  • 2.Winawer SJ, Zauber AG. The advanced adenoma as the primary target of screening. Gastrointestinal Endoscopy Clinics of North America. 2002;12(1):1–9. doi: 10.1016/s1052-5157(03)00053-9. doi:10.1016/S1052-5157(03)00053-9. [DOI] [PubMed] [Google Scholar]
  • 3.Zauber AG, Lansdorp-Vogelaar I, Knudsen AB, Wilschut J, van Ballegooijen M, Kuntz KM. Evaluating test strategies for colorectal cancer screening: a decision analysis for the US Preventive Services Task Force. Annals of Internal Medicine. 2008;149(9):659–669. doi: 10.7326/0003-4819-149-9-200811040-00244. doi:10.7326/0003-4819-149-9-200811040-00244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Frazier AL, Colditz GA, Fuchs CS, Kuntz KM. Cost-effectiveness of screening for colorectal cancer in the general population. Journal of the American Medical Association. 2000;284(15):1954–1961. doi: 10.1001/jama.284.15.1954. doi:10.1001/jama.284.15.1954. [DOI] [PubMed] [Google Scholar]
  • 5.Lieberman DA. Cost-effectiveness model for colon cancer screening. Gastroenterology. 1995;109(6):1781–1790. doi: 10.1016/0016-5085(95)90744-0. doi:10.1016/0016-5085(95)90744-0. [DOI] [PubMed] [Google Scholar]
  • 6.O’Leary BA, Olynyk JK, Neville AM, Platell CF. Cost-effectiveness of colorectal cancer screening: Comparison of community-based flexible sigmoidoscopy with fecal occult blood testing and colonoscopy. Journal of Gastroenterology and Hepatology. 2004;19(1):38–47. doi: 10.1111/j.1440-1746.2004.03177.x. doi:10.1111/j.1440-1746.2004.03177.x. [DOI] [PubMed] [Google Scholar]
  • 7.Allison JE, Tekawa IS, Ransom LJ, Adrain AL. A comparison of fecal occult-blood tests for colorectal-cancer screening. New England Journal of Medicine. 1996;334(3):155–160. doi: 10.1056/NEJM199601183340304. doi:10.1056/NEJM199601183340304. [DOI] [PubMed] [Google Scholar]
  • 8.Levin B, Lieberman DA, McFarland B, Smith RA, Brooks D, Andrews KS, Dash C, Giardiello FM, Glick S, Levin TR, et al. Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: A joint guideline from the american cancer society, the US multi-society task force on colorectal cancer, and the american college of radiology. CA: A Cancer Journal for Clinicians. 2008;58(3):130–160. doi: 10.3322/CA.2007.0018. doi:10.3322/CA.2007.0018. [DOI] [PubMed] [Google Scholar]
  • 9.Rizopoulos D. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics. 2011;67(3):819–829. doi: 10.1111/j.1541-0420.2010.01546.x. doi:10.1111/j.1541-0420.2010.01546.x. [DOI] [PubMed] [Google Scholar]
  • 10.Taylor JMG, Park Y, Ankerst DP, Proust-Lima C, Williams S, Kestin L, Bae K, Pickles T, Sandler H. Real-time individual predictions of prostate cancer recurrence using joint models. Biometrics. 2013;69(1):206–213. doi: 10.1111/j.1541-0420.2012.01823.x. doi: 10.1111/j.1541-0420.2012.01823.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rutter CM, Yu O, Miglioretti DL. A hierarchical non-homogenous poisson model for meta-analysis of adenoma counts. Statistics in Medicine. 2007;26(1):98–109. doi: 10.1002/sim.2460. doi:10.1002/sim.2460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bampton PA, Sandford JJ, Cole SR, Smith A, Morcom J, Cadd B, Young GP. Interval faecal occult blood testing in a colonoscopy based screening programme detects additional pathology. Gut. 2005;54(6):803–806. doi: 10.1136/gut.2004.043786. doi: 10.1136/gut.2004.043786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bampton PA, Sandford JJ, Young GP. Achieving long-term compliance with colonoscopic surveillance guidelines for patients at increased risk of colorectal cancer in Australia. International Journal of Clinical Practice. 2007;61(3):510–513. doi: 10.1111/j.1742-1241.2006.01158.x. doi:10.1111/j.1742-1241.2006.01158.x. [DOI] [PubMed] [Google Scholar]
  • 14.Lane JM, Chow E, Young GP, Good N, Smith A, Bull J, Sandford J, Morcom J, Bampton PA, Cole SR. Interval fecal immunochemical testing in a colonoscopic surveillance program speeds detection of colorectal neoplasia. Gastroenterology. 2010;139(6):1918–1926. doi: 10.1053/j.gastro.2010.08.005. doi:/10.1053/j.gastro.2010.08.005. [DOI] [PubMed] [Google Scholar]
  • 15.Dowling DJ, St John DJB, Macrae FA, Hopper JL. Yield from colonoscopic screening in people with a strong family history of common colorectal cancer. Journal of Gastroenterology and Hepatology. 2000;15(8):939–944. doi: 10.1046/j.1440-1746.2000.02254.x. doi:10.1046/j.1440-1746.2000.02254.x. [DOI] [PubMed] [Google Scholar]
  • 16.Brown GJE, St John DJB, Macrae FA, Aittomäki K. Cancer risk in young women at risk of hereditary nonpolyposis colorectal cancer: implications for gynecologic surveillance. Gynecologic Oncology. 2001;80(3):346–349. doi: 10.1006/gyno.2000.6065. doi: 10.1006/gyno.2000.6065. [DOI] [PubMed] [Google Scholar]
  • 17.Good N, Macrae F, Young G, ODywer J, Slattery M, Venables W, Lockett T, ODywer M. Ideal colonoscopic surveillance intervals to reduce incidence of advanced adenoma and colorectal cancer. Journal of Gastroenterology and Hepatology. 2015 doi: 10.1111/jgh.12904. In Press. [DOI] [PubMed] [Google Scholar]
  • 18.van Houwelingen H, Putter H. Dynamic prediction in clinical survival analysis. CRC Press, Inc.; 2011. [Google Scholar]
  • 19.Yu M, Law NJ, Taylor JMG, Sandler HM. Joint longitudinal-survival-cure models and their application to prostate cancer. Statistica Sinica. 2004;14(3):835–862. [Google Scholar]
  • 20.Wang Y, Taylor JMG. Jointly modeling longitudinal and event time data with application to acquired immunodeficiency syndrome. Journal of the American Statistical Association. 2001;96(455):895–905. doi: 10.1198/016214501753209031. doi: 10.1198/016214501753208591. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Material

RESOURCES