Skip to main content
American Journal of Epidemiology logoLink to American Journal of Epidemiology
. 2017 May 18;186(7):876–884. doi: 10.1093/aje/kwx129

Follow-up of a Large Prospective Cohort in the United States Using Linkage With Multiple State Cancer Registries

Eric J Jacobs *, Peter J Briggs, Anusila Deka, Christina C Newton, Kevin C Ward, Betsy A Kohler, Susan M Gapstur, Alpa V Patel
PMCID: PMC5860149  PMID: 28520845

Abstract

All states in the United States now have a well-established cancer registry. Linkage with these registries may be a cost-effective method of follow-up for cancer incidence in multistate cohort studies. However, the sensitivity of linkage with the current network of state registries for detecting incident cancer diagnoses within cohort studies is not well-documented. We examined the sensitivity of registry linkage among 39,368 men and women from 23 states who enrolled in the Cancer Prevention Study–3 cohort during 2006–2009 and had the opportunity to self-report cancer diagnoses on a questionnaire in 2011. All participants provided name and birthdate, and 94% provided a complete social security number. Of 378 cancer diagnoses between enrollment and 2010 identified through self-report and verified with medical records, 338 were also detected by linkage with the 23 state cancer registries (sensitivity of 89%, 95% confidence interval (CI): 86, 92). Sensitivity was lower for hematologic cancers (69%, 95% CI: 41, 89) and melanoma (70%, 95% CI: 57, 81). After excluding hematologic cancers and melanoma, sensitivity was 94% (95% CI: 91, 97). Our results indicate that linkage with multiple cancer registries can be a sensitive method for ascertaining incident cancers, other than hematologic cancers and melanoma, in multistate cohort studies.

Keywords: cancer registries, cohort studies, linkage


Large cohort studies of cancer require valid and cost-effective methods for ascertaining cancer diagnoses over many years of follow-up. Two methods have been used in large cohort studies in the United States to identify new cancer diagnoses among participants: self-report by participants followed by verification with medical records and computerized linkage with state cancer registries.

Self-report, followed by verification with medical records, has been used by several multistate cohorts that began enrollment before or during the early 1990s, including the Nurses’ Health Study (1); the Health Professionals Follow-up Study (2); the Cancer Prevention Study–II Nutrition Cohort (3, 4); the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (5); and the Women's Health Initiative (6). Each of these studies collected self-reports of cancer on surveys mailed either annually (5, 6) or biennially (14). Using self-report to ascertain cancer diagnoses can be challenging and costly, especially when follow-up must be maintained for decades. In particular, high response rates to follow-up inquiries are needed to ensure study validity, but maintaining high response rates may be increasingly difficult in contemporary general populations (7, 8). In addition, obtaining participant medical records for verification requires participant consent and can be further complicated by the provisions of the Health Insurance Portability and Accountability Act.

Linkage with well-established state cancer registries has long been used as a method of ascertaining incident cancer diagnoses in cohort studies that enrolled participants in only 1 or 2 selected US states, such as the Iowa Women's Health Study (9), the California Teachers Study (10), and others (11, 12). The absence of a nationwide network of high-quality registries until the 2000s may have discouraged the use of registry linkage in multistate cohorts.

Linkage with multiple state cancer registries may now be a cost-effective method of ascertaining cancer diagnoses in large multistate cohorts, due to the development of a comprehensive network of high-quality state cancer registries. Following the Cancer Registries Amendment Act of 1992 (13), the National Program of Cancer Registries of the Centers for Disease Control and Prevention was established to support the development and funding of cancer registries in all states not already covered by the National Cancer Institute's Surveillance, Epidemiology, and End Results Program. In 1998, only 14 states had statewide cancer registries that met the North American Association of Central Cancer Registries (NAACCR) certification requirements (14). By 2005, this number had risen to 41. By 2014, 47 statewide cancer registries, as well as the registry in the District of Columbia, met NAACCR certification requirements. Of these, 39 states and the District of Columbia also met NAACCR requirements for “gold” certification, including at least 95% complete case ascertainment. Large multistate cohort studies have begun to make use of this expanded infrastructure of high-quality registries. For example, both the National Institutes of Health–AARP (NIH-AARP) Study (15) and the Southern Community Cohort Study (16, 17) ascertain cancer diagnoses through linkage with multiple state cancer registries.

The sensitivity of linkage between multistate cohorts and the current nationwide network of state cancer registries is not well-documented. High sensitivity, defined here as detection of a high proportion of all true incident cancer diagnoses, is important to ensure that the results of epidemiologic studies are valid. To our knowledge, the sensitivity of linkage with multiple state registries has been reported in only one cohort study, the NIH-AARP cohort (18), which enrolled participants in 6 states (California, Florida, Pennsylvania, New Jersey, North Carolina, and Louisiana) and 2 metropolitan areas (Atlanta and Detroit). In a pilot study in a subset of the NIH-AARP cohort, registry linkage detected 89% of an estimated 268 true self-reported cancers diagnosed between enrollment in 1995 and 1998 (15). However, medical records for verification were sought only from individuals who self-reported a cancer that was not detected by registry linkage, and medical records could not be obtained from more than half of these individuals. While sensitivity appeared relatively high in the NIH-AARP pilot validation study, the 8 geographic areas included in that study were selected in part because they had high-quality cancer registries (15). In addition, sensitivity was not examined by cancer site. It remains unclear whether linkage with a wider range of contemporary state cancer registries is as sensitive and whether the sensitivity of registry linkage varies by cancer site.

To validate ascertainment of cancer diagnoses through linkage with multiple state cancer registries, we conducted an analysis of a subgroup of 39,368 participants in the Cancer Prevention Study–3 (CPS-3), a nationwide prospective study recently established by the American Cancer Society (19). In this subgroup, cancer incidence was independently ascertained by: 1) self-report, followed by medical record verification, and 2) standardized computerized linkage to 23 state cancer registries. The main purpose of this report is to describe the methods used for this linkage and to estimate its sensitivity for cancer overall and for specific cancer sites.

METHODS

Overall study population

The CPS-3 cohort includes over 300,000 US men and women enrolled between 2006 and 2013 (19). CPS-3 enrollment events took place in 37 geographic areas (35 states, the District of Columbia, and Puerto Rico). As of 2015, over 99% of CPS-3 participants were residents of one of these areas. We plan to conduct follow-up for cancer incidence in CPS-3 primarily through linkage with cancer registries in these 37 areas. At enrollment, all CPS-3 participants completed an informed consent, approved by the Emory University Institutional Review Board, which provides for linkage with registries to identify or verify cancer diagnoses.

Pilot study subgroup

To estimate the sensitivity of registry linkage, we needed to independently identify true cancer diagnoses that could be used as a “gold standard.” We therefore mailed all participants enrolled during 2006–2009 (n = 52,328) a follow-up survey in January 2011. This survey included questions about cancer diagnoses and was completed by 44,633 participants. Participants who self-reported a potentially new cancer diagnosis were mailed a consent form for acquisition of medical records concerning their diagnosis. Medical records from consenting participants were obtained and reviewed by a certified tumor registrar from the American Cancer Society.

We selected 24 states for registry linkage, which included the vast majority of CPS-3 participants enrolled during 2006–2009. Between July 2013 and May 2016, we completed linkages with 23 of these registries (Arizona, California, Colorado, Connecticut, Georgia, Illinois, Indiana, Massachusetts, Maryland, Michigan, Minnesota, Missouri, North Carolina, New Jersey, New York, Ohio, Oregon, Pennsylvania, South Carolina, Texas, Virginia, Washington, and Wisconsin). Figure 1 maps which states were linked during this period. Linkage with the remaining state selected (Florida) has not yet been completed. We did not pursue linkage with the other 11 states, the District of Columbia, or Puerto Rico, because few participants in these areas had been enrolled by the end of 2009, the cutoff date for receiving the 2011 follow-up questionnaire on which cancer diagnoses were self-reported.

Figure 1.

Figure 1.

Map of the United States indicating the states that were included in the Cancer Prevention Study–3 (CPS-3) pilot registry-linkage analysis. Participants in the CPS-3 pilot registry-linkage analysis were enrolled during 2006–2009. White denotes states that did not contain CPS-3 enrollment sites. Light gray denotes states that contained CPS-3 enrollment sites but were not included in the CPS-3 pilot registry-linkage analysis. Dark gray denotes states that contained CPS-3 enrollment sites and were included in the pilot registry-linkage analysis.

From the 44,633 participants who completed the 2011 follow-up questionnaire, we excluded those who were not residents of the 23 linked states at enrollment (n = 4,711) or had moved outside of the 23-state “catchment” area during follow-up (n = 554). After these exclusions, 39,368 participants remained in the validation analysis. Six participants had more than one cancer diagnosis after enrollment, according to registry linkage and/or medical record review; only information on the first diagnosis ascertained by each method was included in analyses.

Linkage protocol

We sent staff at each participating state cancer registry a data file that included identifiers from all CPS-3 participants in this analysis. Because CPS-3 participants could have moved between states, we requested linkage of all participants with the files of each state cancer registry rather than linking only with participants we believed to be living in that state. Identifiers provided by participants included first, middle, and last names as well as social security number (SSN), date of birth, sex, race/ethnicity, current address, and phone number (specific variables and percentage completeness are shown in Web Table 1 (available at https://academic.oup.com/aje), both for participants included in this validation analysis and for all participants in CPS-3). Among participants included in the validation analysis, 94% provided a complete SSN, and an additional 4% of participants provided the last 4 digits. We requested linkage of participants against all cancer diagnoses from January 1, 2006, through December 31, 2010.

Cancer registry staff were asked to follow a standardized linkage protocol using Registry Plus Link Plus software, a probabilistic data linkage program, downloadable from the Centers for Disease Control and Prevention website (20). We provided a detailed instruction manual for this protocol. A Link Plus configuration file tailored for our linkage was also provided. The file listed all variables for blocking (SSN, birth date, first name, and last name) and matching (SSN, birth date, first name, middle name, last name, race/ethnicity, and sex). It also included the matching algorithm for matching variables from the CPS-3 participant file to the state cancer registry file. Linkage between the files generated a score for potential matches, with higher scores indicating a higher likelihood of a true match. Registry staff were asked to manually review all matches with scores equal to or lower than a prespecified minimum score. This minimum score was designed to identify potential true matches without requiring excessive manual review, based on preliminary analyses conducted using the Georgia Cancer Registry. Registry staff compared the matching variables above, as well as address, phone number, and maiden name (when available), to assist them during the manual review process.

RESULTS

Table 1 shows demographic characteristics of participants in this validation analysis. Nearly all participants were between the ages of 30 and 65 at enrollment, 84% were non-Hispanic white, and 76% were female. Relatively few (5%) were current smokers. The mean follow-up time between enrollment and December 31, 2010, was 2.6 years, although this varied considerably by state due to differing enrollment schedules (Web Table 2). Participants were from every region of the United States, with 26% residing in the Northeast (Connecticut, Massachusetts, Maryland, New Jersey, New York, or Pennsylvania), 30% in the Midwest (Illinois, Indiana, Michigan, Minnesota, Missouri, Ohio, or Wisconsin), 18% in the South (Georgia, North Carolina, South Carolina, Texas, or Virginia), and 27% in the West (Arizona, California, Colorado, Oregon, or Washington). All but 10 of the 573 cancer cases identified through registry linkage were found in the cancer registry of the state they originally enrolled in.

Table 1.

Demographic Characteristics of Participants Included in Registry-Validation Analyses, Cancer Prevention Study–3, United States, 2006–2010a

Characteristic All (n = 39,368) Those With a Cancer Diagnosisb (n = 573)
No. of Participants % No. of Participants %
Age at enrollment, years
 20–29 337 0.9 6 1.0
 30–39 10,255 26.0 155 27.1
 40–49 13,458 34.2 176 30.7
 50–59 11,766 29.9 176 30.7
 60–65 3,502 8.9 60 10.5
 >65 50 0.1 0 0.0
Sex
 Female 29,900 76.0 443 77.3
 Male 9,468 24.0 130 22.7
Race/ethnicity
 Non-Hispanic white 32,901 83.6 468 81.7
 Hispanic 3,404 8.6 55 9.6
 African American 1,199 3.0 21 3.7
 Other/unknown 1,864 4.7 29 5.1
Education
 Less than high school 285 0.7 0 0.0
 High school graduate 4,251 10.8 52 9.1
 Some college 13,108 33.3 189 33.0
 College graduate 12,044 30.6 166 29.0
 Graduate school 9,566 24.3 158 27.6
 Unknown 114 0.3 6 1.0
Cigarette-smoking status
 Never 26,875 68.3 358 62.5
 Former 10,505 26.7 189 33.0
 Current 1,935 4.9 26 4.5
 Unknown 53 0.1 0 0.0
State of residence at enrollment
 Arizona 911 2.3 12 2.1
 California 7,321 18.6 109 19.0
 Colorado 840 2.1 7 1.2
 Connecticut 1,833 4.7 33 5.8
 Georgia 2,233 5.7 45 7.9
 Illinois 2,023 5.1 25 4.4
 Indiana 1,668 4.2 12 2.1
 Massachusetts 2,022 5.1 29 5.1
 Maryland 733 1.9 c c
 Michigan 3,106 7.9 39 6.8
 Minnesota 844 2.1 16 2.8
 Missouri 1,913 4.9 27 4.7
 North Carolina 991 2.5 15 2.6
 New Jersey 1,154 2.9 22 3.8
 New York 3,287 8.3 65 11.3
 Ohio 1,555 3.9 14 2.4
 Oregon 102 0.3 c c
 Pennsylvania 1,024 2.6 13 2.3
 South Carolina 395 1.0 12 2.1
 Texas 3,141 8.0 34 5.9
 Virginia 298 0.8 c c
 Washington 1,273 3.2 17 3.0
 Wisconsin 701 1.8 15 2.6

a Includes Cancer Prevention Study–3 participants from selected states enrolled during 2006–2009.

b Identified through linkage with state cancer registries.

c All state-specific cell counts of 5 or less were suppressed to comply with the confidentiality guidelines of certain state cancer registries.

Table 2 compares results from ascertainment using linkage with cancer registries to results from ascertainment using self-report followed by verification with medical records. The first row shows participants who did not self-report a new cancer diagnosis on their 2011 questionnaire (n = 38,111). A total of 32 of these participants (0.08%) were diagnosed with cancer according to linkage with cancer registries. Cancers, according to site, diagnosed in more than one of these participants included melanoma (9 cases), colorectal (6 cases), breast (5 cases), head and neck (3 cases), hematologic (3 cases), and vulvar (3 cases).

Table 2.

Agreement of Cancer Diagnoses From State Cancer Registry Linkage and Self-Report, Cancer Prevention Study–3, United States, 2006–2010

Cancer Diagnosis According to Self-Report and Medical Record Verificationa Registry-Detected Cancer Diagnosisb Total No. of Participants Sensitivity, %c
No Yes
No self-report of cancer 38,079 32 38,111
Self-report of cancer not verified by medical records, or cancer not diagnosed within study time-framed 676 203 879
Self-report of cancer verified by medical recordse 40f 338 378 89.4
Total 38,795 573 39,368

a Cancer diagnoses self-reported on a questionnaire mailed in January 2011.

b Cancer diagnoses between study enrollment date and December 2010 according to linkage with data from 23 state cancer registries.

c Sensitivity of cancer registry linkage based on cancer diagnoses independently verified by medical records (338 divided by 378).

d Includes participants whose initial self-report of a cancer diagnosis was likely erroneous, who did not provide consent for medical record acquisition, or who were diagnosed with cancer before their study enrollment date or after December 2010 (see text and Web Table 3 for details).

e Cancer diagnoses between enrollment date and December 2010 according to medical record review.

f Includes 1 cancer diagnosis classified as endometrial cancer by medical record review but as cervical cancer by registry linkage.

The second row of Table 2 shows participants (n = 879) who self-reported a potentially new cancer diagnosis on the 2011 questionnaire that could not be verified by medical record review or that was diagnosed outside of the time frame of this analysis (enrollment date through December 31, 2010). Results from these participants are not informative about the sensitivity of registry linkage because the majority of them are unlikely to have had a true cancer diagnosis during the time-frame covered by our registry linkage. However, results are provided in Web Table 3 detailing the reasons that self-reports of cancer from these participants were not always verifiable. The 2 most common reasons a questionnaire self-report of cancer could not be verified were that the participant did not return the consent form for obtaining medical records (n = 343) or that they made an error when they initially self-reported cancer. Specifically, 250 participants who reported a cancer on the 2011 questionnaire clarified that they had not been diagnosed with cancer (other than nonmelanoma skin cancer) when they were later sent a consent form for medical record review. Many of these participants had checked boxes on the 2011 questionnaire indicating “melanoma of the skin,” thyroid cancer, or cancer of the uterus or endometrium, and they may have been reporting nonmelanoma skin cancer or diagnoses of thyroid or uterine conditions other than cancer.

The third row of Table 2 shows participants (n = 378) who self-reported a cancer diagnosis on the 2011 survey that we then verified through medical record review. Of the 378 participants with verified cancer diagnoses, 338 (89%) were also detected by registry linkage. Considering these 378 cancer diagnoses to be the “gold standard,” the estimated overall sensitivity for incident cancer diagnoses was 89% (95% confidence interval (CI): 86, 92). The κ statistic (21) for agreement between self-report followed by medical record verification and registry linkage (excluding the second row where cancer status could not be determined by medical record verification) was 0.90 (95% CI: 0.88, 0.93).

Table 3 shows the same 378 “gold standard” cancer diagnoses shown in Table 2, but examines sensitivity of registry linkage by cancer site. Of the 40 cancer diagnoses missed by registry linkage, nearly half (n = 18) were melanoma. Sensitivity was statistically significantly lower for hematologic cancers (69%) and melanoma (70%) than for cancer overall. Excluding hematologic cancers and melanoma, overall sensitivity was 94%.

Table 3.

Sensitivity of State Cancer Registry Linkage According to Cancer Site, Cancer Prevention Study–3, United States, 2006–2010

Cancer Site According to Medical Record Review Registry-Detected Cancer Diagnosis Total No. of Cases Sensitivitya
No Yes % 95% CI
Hematologic 5 11 16 69b 41, 89
Melanoma 18 42 60 70b 57, 81
Less-common cancersc 5 36 41 88 74, 96
Prostate 5 41 46 89 76, 96
Endometrial 2d 17 19 90 67, 99
Colorectal 1 19 20 95 75, 100
Breast 4 150 154 97 94, 99
Thyroid 0 22 22 100 85, 100
All sites 40 338 378 89 86, 92
All sites excluding melanoma and hematologic cancers 17 285 302 94 91, 97

Abbreviation: CI, confidence interval.

a Sensitivity was calculated as the proportion of cases verified by medical records that were also detected by registry linkage.

b Sensitivity for cancers at these sites was significantly lower than sensitivity for cancer overall (P < 0.0001 for melanoma; P = 0.026 for hematologic cancers).

c Cancer sites with fewer than 10 total cases (9 kidney, 5 lung, 5 ovarian, 4 bladder, 4 cervical, 4 head and neck, 2 brain, 2 small intestine, 1 laryngeal, 1 connective/soft tissue, 1 stomach, 1 testicular, 1 thymic, and 1 vulvar).

d One case was identified as endometrial cancer by medical record review but as cervical cancer by registry linkage.

Sensitivity appeared similar among non-Hispanic white people (89%, 95% CI: 85, 92), Hispanics (92%, 95% CI: 79, 98), and African Americans (88%, 95% CI: 62, 98), although numbers were limited for groups other than non-Hispanic white people. In states with 10 or more cases of cancer verified by medical records, sensitivity ranged from 79% to 100%. In analyses excluding melanoma and hematologic cancers, sensitivity appeared similar for cancers categorized according to Surveillance, Epidemiology, and End Results summary stage as in situ or localized (93%, 95% CI: 90, 97) compared with those categorized as regional or distant (98%, 95% CI: 91, 100).

Because there are sometimes delays before cancer diagnoses can be entered into cancer registries, we examined sensitivity by time between diagnosis (as recorded in the medical record) and the date of registry linkage. Sensitivity appeared similar for cancers diagnosed less than 4 years from the date of registry linkage (89% of 162 diagnoses) and for cancers diagnosed 4 or more years from the date of registry linkage (90% of 216 diagnoses). We also specifically examined sensitivity by time for hematologic cancers and melanoma, the 2 cancers with the lowest overall sensitivity. We found no statistically significant differences in sensitivity for hematologic cancers and melanomas diagnosed within 4 years of linkage (3 of 4 hematologic cancers and 18 of 29 melanomas detected) compared with those diagnosed 4 or more years before linkage (8 of 12 hematologic cancers and 24 of 31 melanomas).

Table 4 examines the potential for obtaining false-positive matches from registry linkage—that is, matching a participant with a cancer diagnosis they did not actually have. As shown in Table 4, when registry linkage identified a match, agreement between participant records and registry information was generally good with respect to SSN, date of birth, and first and last name. In addition, the great majority of cancer diagnoses identified by registry linkage were also self-reported by participants. Of the 32 registry-detected cases that were not self-reported by participants, all but 11 were exact matches on full SSN as well as first, middle, and last names. In addition, although these 32 cases did not self-report cancer on their 2011 questionnaire, 23 of them completed a follow-up questionnaire mailed in 2015, on which 8 reported the same cancer site identified through registry linkage and 8 more reported a related diagnosis (e.g., colorectal polyp instead of colorectal cancer).

Table 4.

Agreement of Matching Variables According to Self-Report Status for Registry-Detected Cancer Diagnoses, Cancer Prevention Study–3, United States, 2006–2010a

Cancer Diagnosis by Self-Report Exact Match on Social Security Number Social Security Number Missing or Not Exact Match
All Variables Matchedb 1 Variable Did Not Matchb ≥2 Variables Did Not Matchb All Variables Matchedb 1 Variable Did Not Matchb ≥2 Variables Did Not Matchb
No. of Cases % No. of Cases % No. of Cases % No. of Cases % No. of Cases % No. of Cases %
No cancer self-reported on the 2011 survey 21 66 2 7 0 0 6 19 2 6 1 3
Cancer self-reported on the 2011 survey 420 78 40 6 1 0 67 12 9 2 0 0

a Analysis excluded 4 registry-detected cancer diagnoses missing information on agreement of matching variables.

b Matching defined based on exact match. Matching variables included first name, last name, and birthdate.

DISCUSSION

In our contemporary US cohort, linkage with 23 state cancer registries identified 89% of all participants with a new cancer diagnosis and 94% of all participants with a new diagnosis of cancers other than hematologic cancers and melanoma. Our results also suggest that this registry linkage generated few false-positive cancer diagnoses.

The sensitivity of registry linkage in our study differed by cancer site. Sensitivity was significantly lower for hematologic cancers (69%) and melanoma (70%) than for cancer overall. The lower sensitivity for hematologic cancers and melanoma is likely due to underreporting of these cancers to registries. Unlike most cancers, hematologic cancers and melanoma are often diagnosed and treated entirely outside of the hospital setting and therefore cannot be completely captured by hospital surveillance systems. Underreporting of myeloid cancers and melanoma has previously been described (2224). In addition, in an analysis of a subset of participants in a US cohort study, melanoma and hematologic cancers accounted for a substantial proportion of all cancer cases missed by linkage with cancer registries (4). Prostate cancers may also be diagnosed and treated outside of a hospital setting or not treated at all. However, in our study, sensitivity for identifying prostate cancer diagnoses was relatively high (89%).

To our knowledge, no previous analysis has explicitly reported the sensitivity of multistate registry linkage by cancer site. Our results suggest that studies of hematologic cancers and melanoma in the United States that require a high ascertainment rate should not rely solely on identification of cases through linkage to cancer registries. However, this could change in the future as electronic reporting from free-standing laboratories increases in the United States as part of the growing cancer surveillance infrastructure.

The generalizability of sensitivity estimates from this analysis to other US study populations needs to be considered. Our analysis was based on linkage with 23 state registries representing all regions of the United States. However, if the cancer registries in our analysis had more complete ascertainment of cancer cases or more complete identifying information on patients than other state registries, then the sensitivity observed in our analysis would be higher than in an analysis including all US registries. However, the state cancer registries included in our analysis were selected based on the timing of CPS-3 enrollment in various states, not on registry quality. Completeness of case ascertainment is likely to be similar for registries in our analysis and other state registries. Based on 2010 cancer diagnoses, 17 of the 23 state registries in our analysis met NAACCR “gold” certification requirements, which include 95% case ascertainment, compared with 23 of the 29 registries not included in our analysis. Therefore, the relatively high sensitivity we observed in our analysis appears likely to be broadly generalizable to nationwide study populations in the United States that have similarly complete information on SSN and other identifiers used for linkage.

The sensitivity of registry linkage can be reduced by out-migration of study participants to states where registry linkage is not performed. In our analysis, we excluded participants known to have moved outside of the 23 states we linked with, thereby minimizing reductions in sensitivity due to out-migration. This exclusion was appropriate for estimating the sensitivity of registry linkage for future follow-up of the full CPS-3 cohort, because out-migration to states where registry linkage is not performed will be uncommon. Future CPS-3 linkages will include all 37 geographic areas where CPS-3 enrollment events occurred. These areas included over 99% of CPS-3 participants in 2015. CPS-3 participants who move away from their original enrollment state are likely to remain within these 37 geographic areas because they include more than 93% of the overall US population (25). Long-term cohort studies that enroll participants from a smaller number of states may benefit from linking to registries outside of their original enrollment area in order to avoid missing cases due to out-migration.

The focus of this report is the sensitivity of cancer registry linkage when used as the primary method of cancer follow-up. However, in CPS-3, we plan to supplement registry linkage with other follow-up methods. Too few CPS-3 participants live outside the 37 enrollment areas to make linkage with registries outside of these areas cost-effective under current practices. Therefore, we plan to follow these participants for cancer incidence using self-reports from periodic follow-up questionnaires followed by verification with medical records. In addition, all participants, regardless of residence, who self-report certain cancers—including breast, colorectal, hematologic, prostate, and ovarian cancer—will be asked for consent to obtain hospital medical records so that tumor samples can be collected, as we do in the Cancer Prevention Study–II Nutrition Cohort (26). All CPS-3 participants will also be periodically linked to the National Death Index, as in Cancer Prevention Study–II (27), providing information on vital status and cause of death, including death from causes other than cancer.

Linkage with cancer registries needs to be highly sensitive for identifying true cancer diagnoses but also needs to avoid generating false-positive cancer diagnoses (identifying cancer diagnoses in participants who did not actually have one). It is not possible to determine the exact proportion of registry matches in our analysis that were false positives, but it appears likely to be very low. More than 90% of the cancer diagnoses detected by registry linkage were also self-reported, and therefore unlikely to be false positives. Most registry-detected cancers that were not self-reported were exact matches on SSN, date of birth, and first and last name. Furthermore, our linkage protocol required that no potential match with a questionable matching score be accepted until after passing a manual review by registry staff that included the ability to compare additional matching variables, including address and phone number. The specificity of registry linkage (the proportion of participants not diagnosed with cancer who were correctly classified by registry linkage as not diagnosed with cancer) can be assumed to be high, but cannot be precisely calculated because we cannot independently determine with complete certainty which participants were not actually diagnosed with cancer.

Our results demonstrate that registry linkage can be a highly sensitive method for identifying incident cancers in cohort studies. Given that participants in contemporary cohorts may often fail to respond to requests for consent to obtain medical records, registry linkage, with appropriate informed consent, offers an important opportunity to capture critically important information about confirmed cancer diagnoses. However, completing registry linkages with multiple states requires considerable administrative effort and resources. For the 23 linkages included in this analysis, we completed a separate written application for each state registry. Many state registries also required a separate institutional review board application. Moving these applications through each state's administrative processes was often time-consuming. Other researchers have described the substantial administrative time and effort currently required for multistate linkage (27).

NAACCR is currently developing an innovative method to facilitate simultaneous linkage of cohorts with multiple population-based cancer registries. Through a single application process to NAACCR, researchers will submit a single cohort data file to NAACCR's data fiduciary for simultaneous distribution to each participating NAACCR member registry. Matching will occur behind each registry's firewall using a standardized protocol that safeguards the integrity and confidentiality of the registry's data. The number of matches in each state will be provided to the researchers, who may then seek information on cancer diagnoses in individual cohort members from only those registries with matches, thereby potentially saving time, money, and resources. This method will soon be demonstrated, as part of a National Cancer Institute–funded project, by matching 2 large national cohorts with cancer registries covering at least 67% of the US population. NAACCR is also developing a standardized internal review board application template and a centralized internal review board approval process, designed to facilitate the approvals process. While the extent to which individual state cancer registries adopt more standardized internal review board procedures cannot be guaranteed, we anticipate that these advancements will reduce barriers to linking cohorts with multiple population-based cancer registries, improve outcomes, and be more efficient and cost-effective.

In summary, results from this pilot study indicate that linkage with multiple state cancer registries can be a sensitive method for ascertaining diagnoses of most types of cancer in multistate epidemiologic studies in the United States. The US public has made a substantial investment in state cancer registries; making use of these registries to support multistate linkages for epidemiologic research studies provides an important additional return on this investment. More efficient administrative processes and applying standardized matching protocols across multistate linkages could help maximize the contributions of state cancer registries to cancer research by cohort studies.

Supplementary Material

Web Material

ACKNOWLEDGMENTS

Author affiliations: Epidemiology Research Program, American Cancer Society, Atlanta, Georgia (Eric J. Jacobs, Peter J. Briggs, Anusila Deka, Christina C. Newton, Susan M. Gapstur, Alpa V. Patel); Georgia Center for Cancer Statistics, Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia (Kevin C. Ward); and North American Association of Central Cancer Registries, Springfield, Illinois (Betsy A. Kohler).

This work was supported by the American Cancer Society, which funds the Cancer Prevention Study–3 cohort.

We thank, for their contributions to this study, the central cancer registries supported through the Centers for Disease Control and Prevention National Program of Cancer Registries, as well as cancer registries supported by the National Cancer Institute Surveillance, Epidemiology, and End Results program, including registries for the states of Arizona, California, Colorado, Connecticut, Georgia, Illinois, Indiana, Massachusetts, Maryland, Michigan, Minnesota, Missouri, North Carolina, New Jersey, New York, Ohio, Oregon, Pennsylvania, South Carolina, Texas, Virginia, Washington, and Wisconsin. We also thank Kimberly D. Miller of the American Cancer Society for her skilled work in preparing the map shown in Figure 1.

Conflict of interest: none declared.

REFERENCES

  • 1. Colditz GA, Manson JE, Hankinson SE. The Nurses’ Health Study: 20-year contribution to the understanding of health among women. J Womens Health. 1997;6(1):49–62. [DOI] [PubMed] [Google Scholar]
  • 2. Giovannucci E, Ascherio A, Rimm EB, et al. A prospective cohort study of vasectomy and prostate cancer in US men. JAMA. 1993;269(7):873–877. [PubMed] [Google Scholar]
  • 3. Calle EE, Rodriguez C, Jacobs EJ, et al. The American Cancer Society Cancer Prevention Study II Nutrition Cohort: rationale, study design, and baseline characteristics. Cancer. 2002;94(9):2490–2501. [DOI] [PubMed] [Google Scholar]
  • 4. Bergmann MM, Calle EE, Mervis CA, et al. Validity of self-reported cancers in a prospective cohort study in comparison with data from state cancer registries. Am J Epidemiol. 1998;147(6):556–562. [DOI] [PubMed] [Google Scholar]
  • 5. Prorok PC, Andriole GL, Bresalier RS, et al. Design of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Control Clin Trials. 2000;21(6 suppl):273S–309S. [DOI] [PubMed] [Google Scholar]
  • 6. Curb JD, McTiernan A, Heckbert SR, et al. Outcomes ascertainment and adjudication methods in the Women's Health Initiative. Ann Epidemiol. 2003;13(9 suppl):S122–S128. [DOI] [PubMed] [Google Scholar]
  • 7. Morton LM, Cahill J, Hartge P. Reporting participation in epidemiologic studies: a survey of practice. Am J Epidemiol. 2006;163(3):197–203. [DOI] [PubMed] [Google Scholar]
  • 8. Galea S, Tracy M. Participation rates in epidemiologic studies. Ann Epidemiol. 2007;17(9):643–653. [DOI] [PubMed] [Google Scholar]
  • 9. Bisgard KM, Folsom AR, Hong CP, et al. Mortality and cancer rates in nonrespondents to a prospective study of older women: 5-year follow-up. Am J Epidemiol. 1994;139(10):990–1000. [DOI] [PubMed] [Google Scholar]
  • 10. Bernstein L, Allen M, Anton-Culver H, et al. High breast cancer incidence rates among California teachers: results from the California Teachers Study (United States). Cancer Causes Control. 2002;13(7):625–635. [DOI] [PubMed] [Google Scholar]
  • 11. Kolonel LN, Henderson BE, Hankin JH, et al. A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol. 2000;151(4):346–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. White E, Patterson RE, Kristal AR, et al. VITamins And Lifestyle cohort study: study design and characteristics of supplement users. Am J Epidemiol. 2004;159(1):83–93. [DOI] [PubMed] [Google Scholar]
  • 13.Cancer Registries Amendment Act, 42 USC §280e. https://www.cdc.gov/cancer/npcr/pdf/publaw.pdf 1992.
  • 14. NAACCR: North American Association of Central Cancer Registries Who Is Certified. https://20tqtx36s1la18rvn82wcmpn-wpengine.netdna-ssl.com/wp-content/uploads/2016/11/1998-Certified-Registries-1995-Incidence-Data.pdf. Accessed November 23, 2015.
  • 15. Michaud DS, Midthune D, Hermansen S, et al. Comparison of cancer registry case ascertainment with SEER estimates and self-reporting in a subset of the NIH-AARP Diet and Health Study. J Registry Manag. 2005;32(2):70–75. [Google Scholar]
  • 16. Signorello LB, Hargreaves MK, Steinwandel MD, et al. Southern Community Cohort Study: establishing a cohort to investigate health disparities. J Natl Med Assoc. 2005;97(7):972–979. [PMC free article] [PubMed] [Google Scholar]
  • 17. Blot WJ, Cohen SS, Aldrich M, et al. Lung cancer risk among smokers of menthol cigarettes. J Natl Cancer Inst. 2011;103(10):810–816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Schatzkin A, Subar AF, Thompson FE, et al. Design and serendipity in establishing a large cohort with wide dietary intake distributions: the National Institutes of Health–American Association of Retired Persons Diet and Health Study. Am J Epidemiol. 2001;154(12):1119–1125. [DOI] [PubMed] [Google Scholar]
  • 19. The American Cancer Society Cancer Prevention Study-3 (CPS-3). http://www.cancer.org/research/researchtopreventcancer/currentcancerpreventionstudies/cancer-prevention-study-3. Accessed October 15, 2015.
  • 20. National Program of Cancer Registries (NPCR), Centers for Disease Control and Prevention Registry Plus Link Plus. http://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm. Updated January 13, 2015. Accessed October 15, 2015.
  • 21. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174. [PubMed] [Google Scholar]
  • 22. Cockburn M, Swetter SM, Peng D, et al. Melanoma underreporting: why does it happen, how big is the problem, and how do we fix it. J Am Acad Dermatol. 2008;59(6):1081–1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Cartee TV, Kini SP, Chen SC. Melanoma reporting to central cancer registries by US dermatologists: an analysis of the persistent knowledge and practice gap. J Am Acad Dermatol. 2011;65(5 suppl 1):S124–S132. [DOI] [PubMed] [Google Scholar]
  • 24. Craig BM, Rollison DE, List AF, et al. Underreporting of myeloid malignancies by United States cancer registries. Cancer Epidemiol Biomarkers Prev. 2012;21(3):474–481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Bureau of the Census, US Department of Commerce Census 2010 Resident Population Data, Population Density. Washington, DC: Bureau of the Census; http://www.census.gov/2010census/data/apportionment-dens-text.php. Accessed October 19, 2015. [Google Scholar]
  • 26. Campbell PT, Deka A, Briggs P, et al. Establishment of the cancer prevention study II nutrition cohort colorectal tissue repository. Cancer Epidemiol Biomarkers Prev. 2014;23(12):2694–2702. [DOI] [PubMed] [Google Scholar]
  • 27. Calle EE, Terrell DD. Utility of the National Death Index for ascertainment of mortality among cancer prevention study II participants. Am J Epidemiol. 1993;137(2):235–241. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Web Material

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press

RESOURCES