Abstract
Background
Precision medicine has become a mainstay of cancer care in recent years. The National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) Program has been an authoritative source of cancer statistics and data since 1973. However, tumor genomic information has not been adequately captured in the cancer surveillance data, which impedes population-based research on molecular subtypes. To address this, the SEER Program has developed and implemented a centralized process to link SEER registries’ tumor cases with genomic test results that are provided by molecular laboratories to the registries.
Methods
Data linkages were carried out following operating procedures for centralized linkages established by the SEER Program. The linkages used Match*Pro, a probabilistic linkage software, and were facilitated by the registries’ trusted third party (an honest broker). The SEER registries provide to NCI limited datasets that undergo preliminary evaluation prior to their release to the research community.
Results
Recently conducted genomic linkages included OncotypeDX Breast Recurrence Score, OncotypeDX Breast Ductal Carcinoma in Situ, OncotypeDX Genomic Prostate Score, Decipher Prostate Genomic Classifier, DecisionDX Uveal Melanoma, DecisionDX Preferentially Expressed Antigen in Melanoma, DecisionDX Melanoma, and germline tests results in Georgia and California SEER registries.
Conclusions
The linkages of cancer cases from SEER registries with genomic test results obtained from molecular laboratories offer an effective approach for data collection in cancer surveillance. By providing de-identified data to the research community, the NCI’s SEER Program enables scientists to investigate numerous research inquiries.
Precision medicine has become a mainstay of cancer care in recent years. The genomic landscape of tumors as well as the individual germline sequence are essential for cancer diagnosis, treatment, and prognosis. The number of recognized clinically important genomic alterations is constantly increasing, with new biomarkers and targeted therapies becoming the standard of care at a rapid rate (1-8).
The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute (NCI) is an authoritative source for population-based cancer statistics and data (https://seer.cancer.gov/). Comprehensive data have been collected since 1973; however, it has been challenging for the SEER registries to keep up with the paradigm shift in oncology practice and to provide clinically relevant precision oncology data for cancer statistics and research. With the aging of the population, cancer registries are under constant pressure to ascertain more tumor cases and to collect increasingly larger volumes of data through the traditional processes of medical chart review and abstraction. Although the number of biomarkers collected through traditional methods is slowly increasing, it is not adequate to provide deep understanding of incidence, trends, and survival by molecular subtypes or to assess treatment patterns and response by genomic alterations. Inclusion of new biomarkers in the North American Association of Central Cancer Registries standards (https://www.naaccr.org/) can be a protracted process. Coupled with the 2-year lag between the diagnosis year and data release, there may be a delay before data on the molecular characterization of tumors in clinical practice become available for population-based research. In addition, it is an ongoing challenge to keep cancer registrars fully cognizant in reading, interpreting, and recording this ever-increasing volume of genomic data.
Historically, a few individual biomarkers have been assessed and frequently included in pathology reports, which SEER registries have been receiving in near real time for most tumor cases. In more recent years, replacing single gene assessment with multigene panels has brought more complexity to data collection. Although the College of American Pathologists has developed and constantly updates biomarker protocols and requires that biomarkers be included in pathology reports even if they are assessed at outside molecular laboratories, this guidance is not followed consistently. Moreover, tumor genomic and genetic tests can be ordered in many different settings (eg, community oncology practices) and therefore may not be available to cancer registrars who are traditionally hospital based.
To address the above challenges, NCI’s Surveillance Research Program (SRP), in collaboration with the SEER registries and NCI’s biomedical computing support contractor (Information Management Services [IMS]), has been developing new methods for the collection of genomic data for cancer cases ascertained by the SEER registries as part of public health surveillance. This paper will discuss linkages of SEER registries’ cancer cases with genomic test results provided by molecular laboratories, as well as describe the specialized databases that will be made available to the research community.
Methods
Data linkages were conducted according to SEER registries’ operating procedures for centralized linkages. SEER registries link their cancer cases with a variety of data sources using personally identifying information, which they collect, maintain, and update regularly. The linkages that registries conduct can be performed at each registry or centrally using a third party. SEER registries are central cancer registries that operate under state-specific laws and are exempt from the Health Insurance Portability and Accountability Act (45 CFR 164.512[b]). Most linkages were conducted as part of cancer surveillance activities with molecular laboratories serving as a reporting entity (health-care provider). Some linkages were initiated as demonstration pilot projects, approved by an institutional review board (IRB), aimed at assessing feasibility before implementing them on a larger scale. These linkages were all performed by IMS, functioning as the SEER registries’ trusted third party and honest broker. IMS has executed agreements with each registry to maintain individual registry’s data management systems; these agreements include clauses about linkages.
All linkages used Match*Pro, a probabilistic record linkage application developed by IMS and funded by NCI, which is widely used and freely available (https://seer.cancer.gov/tools/matchpro/). Match*Pro identifies records across multiple distinct data sources that refer to the same entity. The algorithms used by Match*Pro are based on a model developed in 1969 by Fellegi and Sunter (9). Their model outlined a method for computing a weight for each identifier based on its ability to correctly identify a match. Those weights are then used to calculate the probability that 2 records refer to the same entity. Match*Pro can be configured to use different data items, weighting, and acceptable combined weight cutoffs so that each linkage algorithm is appropriate for the specific molecular laboratory–available personally identifying information data items.
All registries that were core SEER registries (https://seer.cancer.gov/registries/) at the time of each linkage participated, except for the Alaska Native Tumor Registry, which did not participate because of small numbers and increased risk of re-identifiability. After the necessary legal agreements with the participating parties are signed, the registries and molecular laboratory send files of personally identifying information with their respective unique record identifier (ID) (eg, registry record identifier ID and lab record ID) to IMS. IMS links the personally identifying information files using Match*Pro to obtain the best possible match for each record and to categorize each best possible match as definitely a match, not a match, or a match that needs to be reviewed. IMS provides individual record pairs that need review to the submitting registry to confirm match status. Registries may use additional data from their data management system and/or external data sources such as motor vehicle registration files, LexisNexis, or registry database to confirm a match. After the registry review is complete, IMS creates a crosswalk file of the matching record IDs from the registry and molecular laboratory.
Masked IDs are generated for each matching record pair, and the ID lists (registry masked ID, lab masked ID, and lab record ID) are provided to the molecular laboratory. The personally identifying information files are deleted from the IMS servers unless specific provision for their retention for use as a case-finding source is specified in the legal agreement. The molecular laboratory extracts the genomic test results using its lab record ID for the patients on the list and provides these data to IMS using only the masked IDs. IMS then replaces the masked IDs with the registry record IDs (using the crosswalk file) and provides the respective data to each registry for incorporation into its database. For all linkages, once the data have been distributed, IMS deletes the genomic test results and registries’ data. IMS is the custodian of the crosswalk files and retains them for future needs per legal agreements. Registries submit limited linked datasets to NCI, which initially are used to evaluate the linked data before releasing them to the research community.
We report descriptive statistics for each genomic linkage. For the first linkage of OncotypeDX breast recurrence score, which included cases diagnosed from 2004 to 2012, we compared recurrence score data collected in the traditional way with linked data. This was possible because OncotypeDX recurrence score was collected by the registries under Collaborative Stage 2.0 (Site Specific Factors 22 and 23) since 2010, allowing us to compare 3 years of data.
Results
Several linkages of SEER registry data with genomic test results were conducted in recent years and are described below. A data dictionary with variables and values for each of these genomic tests is provided in Table 1.
Table 1.
Specialized databases with linked genomic test results
| Specialized databases | Variable name | Values | Definition and notes |
|---|---|---|---|
| 1. SEER Research Plusa with OncotypeDX recurrence score fields for invasive breast cancer 2004 and after diagnosis years | OncotypeDX RS recurrence score | 0-100 | OncotypeDX Breast recurrence score reported on a scale 0-100 |
| OncotypeDX Legacy recurrence score risk cat_hx |
|
Risk categories used historically | |
| OncotypeDX TAILORx recurrence score recode |
|
Recurrence score recoded per TAILORx trial categories | |
| OncotypeDX recurrence score reason no score |
|
Test ordered; results may be available but not provided to registries.
|
|
| OncotypeDX recurrence score test year | 2004 and after | Year in which the test was performed | |
| OncotypeDX recurrence score months since diagnosis | ≥1 | Number of months between diagnosis and test | |
| 2. SEER Research Plus with OncotypeDX DCIS fields for in situ breast cancer 2011 and after diagnosis years | OncotypeDX DCIS score | 0-100 | OncotypeDX Breast DCIS Score reported on a scale 0-100 |
| OncotypeDX DCIS risk group |
|
Categorization of risk groups | |
| OncotypeDX DCIS reason no score |
|
See notes for OncotypeDX recurrence score | |
| OncotypeDX DCIS test year | 2011 and after | Year in which the test was performed | |
| OncotypeDX DCIS months since diagnosis | ≥1 | Number of months between diagnosis and test | |
| 3. SEER Research Plus with OncotypeDX GPS fields for prostate cancer 2013 and after | OncotypeDX GPS | 0-100 | OncotypeDX GPS reported on a scale 0-100 |
| OncotypeDX GPS plus NCCN risk group (2013 to May 2017) |
|
Overall likelihood of adverse pathology or risk of metastasis within 10 years, after combining the GPS result with clinical features | |
| OncotypeDX GPS plus NCCN risk group (May 2017 and after) |
|
Overall likelihood of adverse pathology or risk of metastasis within 10 years, after combining the GPS result with clinical features. | |
| OncotypeDX GPS reason no score |
|
Risk group categories updated in May 2017 to align with revised NCCN guidelines. | |
| OncotypeDX GPS test year | 2013 and after | Year in which the test was performed | |
| OncotypeDX GPS months since diagnosis | ≥1 | Number of months between diagnosis and test | |
| 4. SEER Research Plus with Decipher prostate biopsy (2016 and after) and/or Decipher RP (2013 and after) | Decipher type of test | 1. Decipher prostate bx 2. Decipher RP | Indicate the source of specimen (biopsy or radical prostatectomy) |
| Decipher PC score | 0-1 | Decipher Prostate Genomic Classifier Score reported on a scale 0-1 | |
| Decipher PC risk group |
|
Risk categories | |
| Decipher PC test year | 2013 and after | Year in which the test was performed | |
| Decipher PC months since diagnosis | ≥1 | Number of months between diagnosis and test | |
| 5. SEER Research Plus with DecisionDX UM (2010 and after) and PRAME (2016 and after) | DecisionDX UM risk class |
|
Risk class: 1A = low; 1B = intermediate; 2 = high |
| DecisionDX UM test year | 2010 and after | Year in which the test was performed | |
| DecisionDX UM months since diagnosis | ≥1 | Number of months between diagnosis and test | |
| PRAME |
|
Result reported as positive or negative | |
| PRAME test year | 2016 and after | Year in which the test was performed | |
| PRAME months since diagnosis | ≥1 | Number of months between diagnosis and test | |
| 6. SEER Research Plus with DecisionDX CM (2013 and after) | DecisionDX CM risk class |
|
Categorization of risk class |
| DecisionDX CM risk subclass |
|
Categorization of risk subclass | |
| DecisionDX CM test year | 2013 and after | Year in which the test was performed | |
| DecisionDX CM months since diagnosis | ≥1 | Number of months between diagnosis and test | |
| 7. Georgia–California genetic linkage (2013 and after) | Gene name/symbol | Gene name | Gene included if tested at least by 2 of the 4 molecular laboratories |
| Gene status |
|
Categorization of gene status as normal, pathogenic variant, and variant of unknown significance | |
| Genetic test year | 2013 and after | Year in which the test was performed | |
| Genetic test months since diagnosis | ≥1 | Number of months between diagnosis and test |
SEER Research Plus for diagnosis year 2000 and after are used to create the Specialized Database. CM = cutaneous melanoma; DCIS = ductal carcinoma in situ; DX = diagnosis; GPS = genomic prostate score; NCCN = National Comprehensive Cancer Network; PC = prostate cancer; PRAME = Preferentially Expressed Antigen in Melanoma; SEER = Surveillance, Epidemiology, and End Results; UM = uveal melanoma.
OncotypeDX breast recurrence score
OncotypeDX breast recurrence score (Exact Sciences Corp) is a 21-gene assay that stratifies the risk of distant recurrence and the benefit of chemotherapy in hormone receptor–positive, HER2-negative invasive breast cancer (10,11). It was introduced in 2004 and was incorporated in the National Comprehensive Cancer Network (NCCN) guidelines for early stage lymph node negative invasive breast cancer in 2008 and in 1-3 lymph node positive invasive breast cancer in 2015 (https://www.nccn.org).
Periodic linkages have been conducted since 2015. The most recent linkage included all invasive breast cancer cases diagnosed during 2004-2019 (n = 1 620 174) of which 364 292 linked to OncotypeDX recurrence score tests. Figure 1 shows the distribution of OncotypeDX recurrence score risk categories by race.
Figure 1.
Distribution of OncotypeDX recurrence score risk categories by race in invasive breast cancer: SEER 17 registries (excluding Alaska Native American Registry) 2004-2017 diagnosis years. DX = diagnosis; RS = recurrence score; SEER = Surveillance, Epidemiology, and End Results.
The comparison of the first linkage of OncotypeDX recurrence score with registry-collected OncotypeDX recurrence score showed that 40% of tests reported by the molecular laboratory and identified via linkage were not captured by manual abstraction from medical records for invasive breast cancer diagnosed during 2010-2012. At the same time, 7% of cases reported to have had OncotypeDX recurrence score in registry data were not linked to data from the molecular laboratory. The agreement of registry-collected and linkage-provided recurrence score was 94%, and the risk group misclassification was less than 2%.
OncotypeDX breast ductal carcinoma in situ (DCIS)
The OncotypeDX breast DCIS test (Exact Sciences Corp) is a 12-gene assay indicated for hormone receptor–positive DCIS. It stratifies the risk of local recurrence and the benefit of radiation therapy. The test was introduced in 2012 and is not currently recommended by clinical guidelines (12,13). The latest linkage included all 224 149 breast cancer in situ cases diagnosed during 2011-2019, which linked to 9028 test results. Figure 2 shows the OncotypeDX DCIS risk group distribution by race.
Figure 2.
Distribution of OncotypeDX DCIS risk categories by race in in situ breast cancer: SEER 17 registries (excluding Alaska Native American Registry) 2011-2017 diagnosis years. “Other” category includes American Indian and Alaska Native and Asian and Pacific Islander. DCIS = ductal carcinoma in situ; DX = diagnosis; RS = recurrence score; SEER = Surveillance, Epidemiology, and End Results.
OncotypeDX genomic prostate score
The OncotypeDX genomic prostate score is a 17-gene assay indicated for localized prostate cancer with Gleason score 3 + 3 or 3 + 4 (14,15). Results are provided as a score that ranges from 0 to 100. This score predicts 10-year prostate cancer–specific mortality and 10-year risk of distant metastasis. Risk categories are based on genomic prostate score and NCCN risk categories.
The latest linkage completed in 2022 linked all 647 080 prostate cancer cases diagnosed during 2013-2019 to genomic prostate score tests, resulting in 31 053 matches. Table 1 lists variables provided through the linkage. Of note, the risk group assignment changed during 2017 with the change in NCCN prostate cancer risk categorization. Figure 3 shows the distribution of OncotypeDX genomic prostate score risk by race.
Figure 3.
Distribution of OncotypeDX GPS risk by race in prostate cancer: SEER 17 registries (excluding Alaska Native American Registry) 2013-2017 diagnosis years. “Other” category includes American Indian and Alaska Native and Asian and Pacific Islander. DX = diagnosis; GPS = genomic prostate score; SEER = Surveillance, Epidemiology, and End Results.
Decipher prostate genomic classifier
Two genomic tests (Veracyte), Decipher Prostate Biopsy (Decipher Bx) and Decipher Prostate Radical Prostatectomy (Decipher RP), were introduced in 2016 and 2014, respectively, and included NCCN guidelines in 2017. These tests measure 22 RNA expressions covering 7 cancer pathways (16-18). Decipher Bx is used in localized or regional prostate cancer without prior radiation therapy or androgen deprivation therapy. Decipher RP is indicated in all prostate cancer cases that had radical prostatectomy without prior radiation therapy or androgen deprivation therapy. The tests predict probabilities of high-grade disease (Gleason 4 or 5) on radical prostatectomy, 5-year probability of clinical metastasis, and 10-year probability of prostate cancer–specific mortality. Test results are reported as a score (range = 0-1) and risk groups.
The linkage included all prostate cancer cases diagnosed during 2010-2018 (n = 561 351) of which 15 309 linked to Decipher test results. Figure 4 shows the distribution of Decipher Bx and RP risk groups by race.
Figure 4.
Distribution of Decipher Prostate Biopsy (A) and Radical Prostatectomy (B) Genomic Risk Classifier risk groups by race in prostate cancer: SEER 22 registries (excluding Alaska Native American Registry), 2010-2018 diagnosis years. “Other” category includes American Indian and Alaska Native and Asian and Pacific Islander. SEER = Surveillance, Epidemiology, and End Results.
DecisionDX Uveal Melanoma (UM) and DecisionDX Preferentially Expressed Antigen in Melanoma (PRAME)
DecisionDX UM (Castle Biosciences) is a 15-gene expression assay indicated in nonmetastatic uveal melanoma (19,20). The test stratifies the 5-year risk of metastasis as very low (class 1A: 2% probability), low (class 1B: 21% probability), and high (class 2: 72% probability). The test was introduced in 2009 and is recommended by NCCN guidelines (19-21).
Introduced in 2016, the PRAME test (Castle Biosciences) is indicated in low-risk (DecisionDX UM class 1) uveal melanoma to further stratify the risk of metastasis. The test results are reported as positive or negative (22).
A total of 8694 uveal melanoma cases (International Classification of Disease Oncology Third Edition topography codes C693-694 and morphology M8720-8790 codes for histology) diagnosed during 2010-2019 were linked to both DecisionDX UM (3186 linked tests) and PRAME (1987 linked tests) test results. Figure 5 shows the distribution of DecisionDX UM risk classes by sex.
Figure 5.
Distribution of DecisionDX Uveal Melanoma risk class by sex: SEER 22 registries (excluding Alaska Native American Registry), 2010-2019 diagnosis years. DX = diagnosis; SEER = Surveillance, Epidemiology, and End Results.
DecisionDX Melanoma
The DecisionDX Melanoma test (Castle Biosciences) is a 31-gene expression assay developed for cutaneous melanoma (23,24). The test is suggested to be prognostic for risk of recurrence, risk of distant metastasis and death in American Joint Committee on Cancer, eighth edition stage I-III, and the probability of positive sentinel lymph node biopsy. It stratifies risk as low (class 1A), intermediate (class 1B and class 2A), and high (class 2B). The test was introduced in 2013 and is not currently recommended by clinical guidelines but increasingly used by community dermatologists.
The latest linkage included 562 408 cutaneous melanoma cases (International Classification of Disease Oncology Third Edition codes for morphology 8720-8790) covering 2010-2019 diagnosis years of which 16 518 linked to DecisionDX Melanoma test results. Figure 6 shows the DecisionDX cutaneous melanoma risk categories’ distribution by sex.
Figure 6.
Distribution of DecisionDX Skin Melanoma risk class by sex: SEER 22 registries (excluding Alaska Native American Registry), 2010-2019 diagnosis years. DX = diagnosis; SEER = Surveillance, Epidemiology, and End Results.
Georgia–California SEER registries genetic tests linkage
Germline genetic testing for cancer risk has expanded rapidly over the last decade and is a key component of patient care for many cancer types, with relevance not only for the patient but also for their family. Pathogenic variants in selected genes inform clinical strategies for prevention, early detection, and treatment.
As part of a pilot project and under an IRB-approved protocol, data from the Georgia and California SEER registries were linked with germline test results from the 4 molecular laboratories (Ambry Genetics, GeneDx, Invitae, and Myriad Genetics) that provide most testing in these 2 states. The most recent linkage included patients of all ages and all cancer types diagnosed in Georgia or California between 2013 and 2019 linked to genetic test results from 2012 through midyear 2021. Test results in the linkage dataset were combined across the molecular laboratories for all genes tested by at least 2 molecular laboratories (to ensure laboratory anonymity), and individual gene results were categorized as pathogenic, variant of uncertain significance, or normal. Of the 1 584 923 cancer patients in the registry cohort for these 2 SEER states, 9.1% linked to genetic test results, with considerable variation by cancer type as expected.
Discussion
Reporting genomic test results to cancer registries via linkages is an efficient way to collect this information for cancer surveillance and research. It is fast; relatively inexpensive; and leads to more complete, consistent, and accurate data compared with the traditional methods of medical chart review and data abstraction. Our evaluation of the first OncotypeDX recurrence score linkage showed that a substantial portion (40%) of tests reported by molecular laboratories were not captured by the manual abstraction. This can be expected in the setting of the very dispersed and disconnected health-care system in the United States. Registrars do not generally have access to all health-care facilities where cancer care is provided. Although health-care providers are required by state laws to report information on demographics, diagnosis, tumor characteristics, treatments, and outcomes for all cancer cases to the respective central cancer registry, compliance can vary greatly, especially for outpatient facilities. OncotypeDX recurrence score was frequently ordered by community oncology practices, and the report was sent to the ordering provider and not to the pathology laboratory providing the specimen, which explains the large amount of missing test data collected by registrars. The accuracy of the registry-collected OncotypeDX recurrence score was satisfactory but not ideal.
The centralized approach to undertaking these genomic linkages conferred a number of clear benefits. Firstly, having the registries’ trusted third party (honest broker) conduct the linkages and distribute data to individual registries assured consistency and efficiency. Secondly, molecular laboratories are more likely to engage in reporting genomic test data to cancer registries via centralized linkages because they only need to collaborate with 1 entity (honest broker) as opposed to multiple individual registries. In addition, informatics and staff resources at individual registries can vary, which further impedes engaging in multiple linkages.
Despite the highly efficient data collection and very high data accuracy of these linkages, they have some limitations. Linkage success depends on the quality and completeness of personally identifying information. During adjudication of data from multiple reporting sources (including linkages), cancer registries strive to minimize false-positive matches (linking a test to the wrong patient case), which may lead to an increase in false-negatives. Thus, it is expected that a variable proportion of cases that had the test and are in the registry will not link (false-negatives) if the personally identifying information variables are missing or discrepant. Further, a small proportion of registry cases are not permitted to be included in genomic linkages because of various restrictions. These cases include patients who received their entire cancer care at Veterans Administration or Department of Defense health-care facilities. Some SEER registries have additional restrictions for including cases they receive through interstate data exchange in linkages. For most cancer sites, this represents about 1% of cases. Finally, the linking of a test to a specific cancer is based on the proximity of the test date to the diagnosis date and may not always be precise when an individual has multiple cancers of the same site (eg, 2 or more breast cancers or cutaneous melanomas). Researchers using these data in their studies need to be aware of the above limitations when interpreting the results of their analysis.
NCI SRP has implemented a tier-based approach for release of the SEER data, including data obtained through linkages. The higher the tier, the more rigorous the required process for data release. Tiers 1 and 2 include the standard SEER data that have been released since 1975. Tier 1 (SEER Research) contains de-identified data that exclude any geographic information and dates. Release of tier 1 data requires an application, a valid email address, and a data use agreement. Tier 2 (SEER Research Plus) contains geographic information and some dates (month and year). In addition to tier 1 terms, a data requestor needs to go through an authentication process that requires an institutional eRA Commons account and institutional signing official name. Tier 3 (SEER specialized databases) contains data that may be more sensitive (eg, census tract–based socioeconomic index, genomic data). Each specialized database contains SEER Research Plus 2000+ (“2000+” means diagnosis year 2000 and after) and a set of additional data fields (eg, OncotypeDX Breast recurrence score fields for invasive breast cancer). For release of tier 3 data, all tier 2 requirements must be met, and a brief proposal (research question and analytical plan) must be approved by the NCI SRP specialized databases group. Tier 4 contains limited datasets with multiple longitudinal treatment dates or other more sensitive information (eg, county, genomic data). Tier 4 data release requires all tier 3 terms and IRB approval. Specialized databases containing data from genomic test linkages are categorized as tier 3 or 4. NCI SRP plans to make OncotypeDX recurrence score, OncotypeDX genomic prostate score, Decipher prostate genomic classifier, DecisionDX cutaneous melanoma, and the genetic linkage data available for research in the first half of 2024 and OncotypeDX DCIS and DecisionDX UM and PRAME in 2025. For specific specialized database requirements, please refer to the SEER website (https://seer.cancer.gov/data-software/specialized.html).
Linkages between SEER registries’ cancer cases and genomic test results provided by molecular laboratories represent an efficient method of data collection for cancer surveillance and research. The de-identified data provided to NCI’s SEER Program will allow scientists to explore a multitude of research questions and has already enabled several original research articles to date (25-33).
Acknowledgments
The authors would like to thank SEER registries for participating in the linkages and the clinical laboratories for providing data to the SEER registries: Ambry Genetics, Castle Biosciences, GeneDX/Bioreference, Exact Sciences, Invitae, Myriad Genetics, and Veracyte.
Contributor Information
Valentina I Petkov, Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.
Jung S Byun, Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.
Kevin C Ward, Emory University, Atlanta, GA, USA.
Nicola C Schussler, Information Management Services, Inc, Calverton, MD, USA.
Natalie P Archer, Cancer Epidemiology and Surveillance Branch, Texas Department of State Health Services, Austin, TX, USA.
Suzanne Bentler, Iowa Cancer Registry, The University of Iowa, Iowa City, IA, USA.
Jennifer A Doherty, Hunstman Cancer Institute, University of Utah, Salt Lake City, UT, USA; Department of Population Health Sciences, University of Utah, Salt Lake City, UT, USA.
Eric B Durbin, Cancer Research Informatics Shared Resource Facility, Markey Cancer Center, Kentucky Cancer Registry, University of Kentucky, Lexington, KY, USA.
Susan T Gershman, Massachusetts Cancer Registry, Boston, MA, USA.
Iona Cheng, Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA.
Tabassum Insaf, New York State Department of Health, New York State Cancer Registry, Albany, NY, USA.
Lou Gonsalves, Connecticut Department of Public Health, Connecticut Tumor Registry, Hartford, CT, USA.
Brenda Y Hernandez, University of Hawaii Cancer Center, Honolulu, HI, USA.
Lori Koch, Illinois State Cancer Registry, Springfield, IL, USA.
Lihua Liu, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
Alain Monnereau, Public Health Institute, Cancer Registry of Greater California, Sacramento, CA, USA.
Bozena M Morawski, Cancer Data Registry of Idaho, Boise, ID, USA.
Stephen M Schwartz, Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA.
Antoinette Stroup, New Jersey State Cancer Registry, Trenton, NJ, USA.
Charles Wiggins, Department of Internal Medicine, University of New Mexico, Albuquerque, NM, USA.
Xiao-Cheng Wu, School of Medicine, Louisiana State University, New Orleans, LA, USA.
Sarah Bonds, Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.
Serban Negoita, Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.
Lynne Penberthy, Surveillance Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, USA.
Related manuscripts or abstracts: Several papers and abstracts were published based on linkages of SEER registry cases with genomic test results. Of note, OncotypeDX RS for IBC was released previously to more than 100 requestors. This most likely resulted in publications. We included several but perhaps not all of those publications.
Data availability
Data from linkages of SEER cancer registries’ cases with genomic test results provided by molecular laboratories will be made available to the research community as SEER Specialized Databases in 2023 and 2024 (https://seer.cancer.gov/data-software/specialized.html).
Author contributions
Valentina I Petkov, MD, MPH (Conceptualization; Methodology; Writing—original draft), Sarah Bonds, MS (Project administration), Xiao-Cheng Wu, PhD, MD (Writing—review & editing), Charles Wiggins, PhD (Writing—review & editing), Antoinette Stroup, MD, PhD (Writing—review & editing), Stephen M. Schwartz, PhD, MPH (Writing—review & editing), Bożena M. Morawski, PhD, MPH (Writing—review & editing), Alain Monnereau, MD, PhD (Writing—review & editing), Lihua Liu, PhD (Writing—review & editing), Lori Koch, PhD (Writing—review & editing), Brenda Y. Hernandez, PhD (Writing—review & editing), Lou Gonsalves, PhD (Writing—review & editing), Tabassum Insaf, PhD, MPH (Writing—review & editing), Iona Cheng, PhD, MPH (Writing—review & editing), Susan T. Gershman, PhD, MPH (Writing—review & editing), Eric B. Durbin, PhD, MS, MD (Writing—review & editing), Jennifer A. Doherty, PhD, MS (Writing—review & editing), Suzanne Bentler, PhD (Writing—review & editing), Natalie P. Archer, PhD, MS (Writing—review & editing), Nicola C. Schussler, BS (Data curation; Formal analysis), Kevin C. Ward, PhD, MPH (Conceptualization; Writing—original draft), Jung S. Byun, PhD, MPH (Formal analysis; Visualization; Writing—original draft), Serban Negoita, PhD, MD (Conceptualization; Supervision), and Lynne Penberthy, PhD, MD (Conceptualization; Supervision).
Funding
This work was supported by:
KCW, Emory University, Atlanta, GA, through National Cancer Institute, Surveillance, Epidemiology and End Results Program Contract Award HHSN261201800003I and Centers for Disease Control and Prevention 6NU58DP006352-05-01.
NPA, Texas Cancer Registry, through National Cancer Institute, Surveillance, Epidemiology and End Results Program Contract Award 75N91021D00011 and Centers for Disease Control and Prevention 1NU58DP007140.
SB, Iowa Cancer Registry, through National Cancer Institute, Surveillance, Epidemiology and End Results Program Contract Award HHSN261201800012I_HHSN26100001.
JAD, Utah Cancer Registry, through National Cancer Institute, Surveillance, Epidemiology and End Results Program Contract Award HHSN261201800016I.
EBD, Kentucky Cancer Registry, through National Cancer Institute, Surveillance, Epidemiology and End Results Program Contract Award HHSN261201800013I.
STG, Massachusetts Cancer Registry, MA, through National Cancer Institute, Surveillance, Epidemiology and End Results Program Contract Award HHSN26120180008I (HHSN26100001).
IC, University of California, San Francisco, CA, through National Cancer Institute, Surveillance, Epidemiology and End Results Program Contract Award HHSN261201800032I.
TI, New York State Cancer Registry through National Cancer Institute, Surveillance, Epidemiology, and End Results Program Contract Award HHSN261201800005I (UPIID: 75N91018D00005).
LG, the Connecticut Tumor Registry is supported by Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract no. HHSN261201800002I.
BYH, University of Hawaii Cancer Center, HI, through National Cancer Institute, Surveillance, Epidemiology and End Results Program Contract Award HHSN261201300009I.
LK, Illinois State Cancer Registry, IL, through National Cancer Institute, Surveillance, Epidemiology, and End Results Program Contract Award 75N91021D00006.
LL, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, through National Cancer Institute, Surveillance, Epidemiology, and End Results Program Contract Award HHSN261201800032I.
AM, Cancer Registry of Greater California through National Cancer Institute, Surveillance, Epidemiology, and End Results Program Contract Award HHSN261201800009I.
BMM, Cancer Data Registry of Idaho, Idaho Hospital Association, Boise, ID, through National Cancer Institute, Surveillance, Epidemiology, and End Results Program Contract Award HHSN261201800006I and Centers for Disease Control and Prevention 1NU58DP006270.
SMS, Division of Public Health Sciences Fred Hutchinson Cancer Center, through National Cancer Institute, Surveillance, Epidemiology, and End Results Program Contract Award HHSN2612018000041.
AS, New Jersey State Cancer Registry, NJ, through National Cancer Institute, Surveillance, Epidemiology, and End Results Program Contract Award 75N91021D00009.
CW, University of New Mexico, Albuquerque, NM, through National Cancer Institute, Surveillance, Epidemiology, and End Results Program Contract Award HHSN261201800014I.
XW, School of Public Health Louisiana State University Health New Orleans, through National Cancer Institute, Surveillance, Epidemiology, and End Results Program Contract Award HHSN261201800007I/HHSN26100002.
Monograph sponsorship
This article appears as part of the monograph “50th Anniversary Issue of the National Cancer Institute’s SEER Program: A Half-Century of Turning Cancer Data into Discovery,” sponsored by the National Cancer Institute.
Conflicts of interest
The authors declare no conflicts of interest.
References
- 1. Mateo J, Steuten L, Aftimos P, et al. Delivering precision oncology to patients with cancer. Nat Med. 2022;28(4):658-665. doi: 10.1038/s41591-022-01717-2 [DOI] [PubMed] [Google Scholar]
- 2. Kawaji H, Kubo M, Yamashita N, et al. Comprehensive molecular profiling broadens treatment options for breast cancer patients. Cancer Med. 2021;10(2):529-539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Yadav S, Couch FJ. Germline genetic testing for breast cancer risk: the past, present, and future. Am Soc Clin Oncol Educ Book. 2019;39:61-74. [DOI] [PubMed] [Google Scholar]
- 4. Roy-Chowdhuri S. Molecular pathology of lung cancer. Surg Pathol Clin 2021;14(3):369-377. [DOI] [PubMed] [Google Scholar]
- 5. Beech C, Hechtman JF. Molecular approach to colorectal carcinoma: current evidence and clinical application. Surg Pathol Clin 2021;14(3):429-441. [DOI] [PubMed] [Google Scholar]
- 6. Kulac I, Roudier MP, Haffner MC. Molecular pathology of prostate cancer. Surg Pathol Clin 2021;14(3):387-401. [DOI] [PubMed] [Google Scholar]
- 7. Leão R, Ahmad AE, Hamilton RJ. Testicular cancer biomarkers: a role for precision medicine in testicular cancer. Clin Genitourin Cancer. 2019;17(1):e176-e183. [DOI] [PubMed] [Google Scholar]
- 8. Slack JC, Church AJ. Molecular alterations in pediatric solid tumors. Surg Pathol Clin 2021;14(3):473-492. [DOI] [PubMed] [Google Scholar]
- 9. Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc. 1969;64(328):1183-1210. [Google Scholar]
- 10. Paik S, Shak S, Tang G, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351(27):2817-2826. [DOI] [PubMed] [Google Scholar]
- 11. Paik S, Tang G, Shak S, et al. Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer. J Clin Oncol. 2006;24(23):3726-3734. [DOI] [PubMed] [Google Scholar]
- 12. Solin LJ, Gray R, Baehner FL, et al. A multigene expression assay to predict local recurrence risk for ductal carcinoma in situ of the breast. J Natl Cancer Inst. 2013;105(10):701-710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Rakovitch E, Nofech-Mozes S, Hanna W, et al. A population-based validation study of the DCIS Score predicting recurrence risk in individuals treated by breast-conserving surgery alone. Breast Cancer Res Treat. 2015;152(2):389-398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Klein EA, Cooperberg MR, Magi-Galluzzi C, et al. A 17-gene assay to predict prostate cancer aggressiveness in the context of Gleason grade heterogeneity, tumor multifocality, and biopsy undersampling. Eur Urol. 2014;66(3):550-560. [DOI] [PubMed] [Google Scholar]
- 15. Cullen J, Rosner IL, Brand TC, et al. A biopsy-based 17-gene genomic prostate score predicts recurrence after radical prostatectomy and adverse surgical pathology in a racially diverse population of men with clinically low- and intermediate-risk prostate cancer. Eur Urol. 2015;68(1):123-131. [DOI] [PubMed] [Google Scholar]
- 16. Kim HL, Ping L, Huang HC, et al. Validation of the Decipher Test for predicting adverse pathology in candidates for prostate cancer active surveillance. Prostate Cancer Prostatic Dis. 2019;22(3):399-405. doi: 10.1038/s41391-018-0101-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Klein EA, Haddad Z, Yousefi K, et al. Decipher Genomic classifier measured on prostate biopsy predicts metastasis risk. Urology. 2016;90:148-152. doi: 10.1016/j.urology.2016.01.012 [DOI] [PubMed] [Google Scholar]
- 18. Jairath NK, Dal Pra A, Vince R Jr, et al. A systematic review of the evidence for the decipher genomic classifier in prostate cancer. Eur Urol. 2021;79(3):374-383. [DOI] [PubMed] [Google Scholar]
- 19. Onken MD, Worley LA, Char DH, et al. Collaborative Ocular Oncology Group Report Number 1: Prospective validation of a multi-gene prognostic assay in uveal melanoma. Ophthalmology. 2012;119(8):1596-1603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Harbour JW, Chen R. The DecisionDX-UM gene expression profile test provides risk stratification and individualized patient care in uveal melanoma. PLOS Curr. 2013;5. doi: 10.1371/currents.eogt.af8ba80fc776c8f1ce8f5dc485d4a618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Grossman D, Kim CC, Hartman RI, et al. Prognostic gene expression profiling in melanoma: Necessary steps to incorporate into clinical practice. Melanoma Manag. 2019;6(4):Mmt32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Field MG, Decatur CL, Kurtenbach S, et al. PRAME as an independent biomarker for metastasis in uveal melanoma. Clin Cancer Res. 2016;22(5):1234-1242. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Gerami P, Cook RW, Wilkinson J, et al. Development of a prognostic genetic signature to predict the metastatic risk associated with cutaneous melanoma. Clin Cancer Res. 2015;21(1):175-183. [DOI] [PubMed] [Google Scholar]
- 24. Gastman BR, Gerami P, Kurley SJ, Cook RW, Leachman S, Vetto JT. Identification of patients at risk for metastasis using a prognostic 31-gene expression profile in subpopulations of melanoma patients with favorable outcomes by standard criteria. J Am Acad Dermatol. 2019;80(1):149-157.e4. doi: 10.1016/j.jaad.2018.07.028 [DOI] [PubMed] [Google Scholar]
- 25. Petkov VI, Miller DP, Howlader N, et al. Breast cancer-specific mortality in patients treated based on the 21-gene assay: a SEER population-based study. NPJ Breast Cancer. 2016;2:16017. doi: 10.1038/npjbcancer.2016.17 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Roberts MC, Miller DP, Shak S, Petkov VI. Breast cancer -specific survival in patients with lymph node-positive hormone receptor -positive invasive breast cancer and Oncotype DX Recurrance Score results in the SEER database. Breast Cancer Res Treat. 2017;163(2):303-310. doi: 10.1007/s10549-017-4162-3 [DOI] [PubMed] [Google Scholar]
- 27. Roberts MC, Kurian AW, Petkov VI. Uptake of the 21-gene assay among women with node-positive, hormone receptor-positive breast cancer. J Natl Compr Canc Netw. 2019;17(6):662-668. doi: 10.6004/jnccn.2018.7266 [DOI] [PubMed] [Google Scholar]
- 28. Zaorsky NG, Proudfoot JA, Zuhour R, et al. Use of Decipher genomic classifier among men with prostate cancer in the United States. JNCI Cancer Spectr. 2023;7(5):pkad052. doi: 10.1093/jncics/pkad052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Bailey CN, Martin BJ, Petkov VI, et al. 31-gene Expression Profile testing in cutaneous melanoma and survival outcomes in a population-based analysis: a SEER collaboration. J Clin Oncol Precis Oncol. 2023;7:e2300044. doi: 10.1200/PO.23.00044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Kurian AW, Ward KC, Hamilton AS, et al. Uptake, results and outcomes of germline multigene sequencing after diagnosis of breast cancer. JAMA Oncol. 2018;4(8):1066-1072. doi: 10.1001/jamaoncol.2018.0644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Kurian AW, Ward KC, Howlader N, et al. Genetic testing and results in a population-based cohort of breast cancer patients and ovarian cancer patients. J Clin Oncol. 2019;37(15):1305-1315. doi: 10.1200/JClinOncol.18.01854 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kurian AW, Ward KC, Abrahamse P, et al. Association of germline genetic testingresults with locoregional and systemic therapy in patinets with breast cancer. JAMA Oncol. 2020;6(4):e196400. doi: 10.1001/jamaoncol.2019.6400 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Kurian AW, Abrahamse P, Furgal A, et al. Germline testing after cancer diagnosis. JAMA. 2023;330(1):43-51. doi: 10.1001/jama.2023.9526 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data from linkages of SEER cancer registries’ cases with genomic test results provided by molecular laboratories will be made available to the research community as SEER Specialized Databases in 2023 and 2024 (https://seer.cancer.gov/data-software/specialized.html).






