Abstract
In 2020, the North American Association of Central Cancer Registries (NAACCR) was awarded a contract with the National Cancer Institute (NCI) to begin coordination of a new National Childhood Cancer Registry (NCCR), which would build on the existing infrastructure among both Surveillance, Epidemiology, and End Results (SEER) and National Program of Cancer Registries central registries. NCI and NAACCR planned to use the NCCR to securely match children across registries and with external data sources such as genomic data, medical and pharmacy claims, and other novel sources for residential history, financial toxicity and social determinants of health to build a robust database for pediatric cancer reporting and research. These linkages will enable researchers to address issues surrounding late effects of cancer treatment, recurrence, subsequent malignant neoplasms, and other critical outcomes.
Keywords: central cancer registry, linkages, National Childhood Cancer Registry (NCCR), North American Association of Central Cancer Registries (NAACCR), pediatric cancer
Introduction
For more than 40 years, one of the greatest strengths of cancer registries has been their ability to standardize and aggregate cancer data at the population level. This aggregation is particularly important for the study of rare tumors, where there is an insufficient number of cases at the facility or state level to conduct meaningful investigation and analysis.
Childhood cancers make up less than 1% of the total number of cancers diagnosed in the United States each year, making all childhood cancers, by definition, rare.1 In addition, information that is critical to understanding the causes, burden, and late effects of childhood cancer are siloed in separate data repositories, each with its own structure and governance and with little to no standardization across datasets. This poses additional challenges for investigators who wish to make use of this information in a meaningful way to gain a more complete understanding of childhood cancer.
In 2020, the National Cancer Institute (NCI)'s Board of Scientific Advisors cited “a critical need to collect, analyze, and share data” to address the burden of cancer in children, adolescents, and young adults.2 Based on the recommendations of the Board of Scientific Advisors, the NCI initiated the National Childhood Cancer Registry (NCCR), which aims to build a connected data infrastructure that includes longitudinal data from multiple sources and enables secure sharing and indexing of childhood cancer data with vetted research investigators.
The National Childhood Cancer Registry
At its core, the NCCR bears a striking similarity to existing population-based national datasets available from the NCI's Surveillance, Epidemiology, and End Results (SEER) program, the Centers for Disease Control and Prevention's National Program of Cancer Registries, and the North American Association of Central Cancer Registries (NAACCR):
Data are collected, consolidated, and edited by highly trained data specialists in facility-based and central cancer registries.
Data files are stored and transmitted in the standard NAACCR XML format.
De-identified data are aggregated at the national level.
Limited data and statistics are made available to the public.
More detailed data are available with authorization or institutional review board approval.
However, NCCR is wholly unique in its ability to use the data for enhanced surveillance activities that are not feasible under the traditional national cancer registry model, including interstate linkages to identify subsequent tumors, interstate deduplication, and large-scale linkages with external datasets.
Interstate Linkages to Identify Subsequent Tumors
Evidence has shown that patients treated for cancer in the early years of their life may be at greater risk for developing secondary malignancies.3 However, such studies may have underestimated the risk, because young patients may move from one state to another between diagnoses, so the subsequent cancer is reported to a different central registry than the first and therefore may not be discoverable. In addition, national population-based datasets do not include patient identifiers, so there is no way to link 2 or more cancers diagnosed for the same patient in different states and at different times over the patient's lifespan.
By making use of NAACCR's Virtual Pooled Registry Cancer Linkage System (VPR-CLS), the NCCR can be effectively linked with population-based cancer registry databases throughout the United States, allowing population-based characterization of the risk of subsequent cancers among childhood cancer survivors. The initial linkage between the NCCR and a subset of registries in the VPR-CLS identified more than 6,000 out-of-state subsequent cancers among patients in the NCCR cohort. To learn more about the VPR-CLS, visit https://www.naaccr.org/about-vpr-cls/.
Interstate Deduplication
The same linkage used to identify multiple primary tumors across states was also used to identify duplicate tumors reported by more than one central cancer registry. A duplicate tumor is defined as the same diagnosis, for the same patient, reported more than once. Central registries take great pains to identify and eliminate duplicates within their own databases, but once de-identified data is submitted to national repositories, the ability to match duplicate tumors across states is lost. While the proportion of interstate duplicates is thought to be very small, the impact on pediatric cancer rates is much greater due to the relatively small number of total cases. In addition, there may be a higher rate of duplicate reporting for pediatric cancers since children are often treated at out-of-state specialty hospitals. The linkage of the NCCR with a subset of the VPR-CLS was the first such attempt at large-scale deduplication between states and found a duplicate rate of 1.1%. NAACCR has future plans to address duplicate reporting between states.
Large-Scale Linkages with External Datasets
Just as the NCCR can be linked with the VPR-CLS, it can also be linked with other large-scale datasets, including those that contain longitudinal detailed treatment information such as medical and pharmacy claims, clinical trials, and social determinants of health such as financial toxicity and residential history. These additional datasets enrich the data for researchers addressing specific scientific questions.
The NCCR is also part of the NCI's larger Childhood Cancer Data Initiative (CCDI) Data Ecosystem, which allows authorized researchers, clinicians, patients, and families a single point of access to a wide array of platforms, tools, and resources, including molecular characterization of childhood cancer.
Summary
To date, the NCCR includes data on nearly 1.5 million cancers diagnosed in patients under the age of 40 years from 1995–2020 in 25 state cancer registries, with plans to bring on several more registries in the coming year. Data from the NCCR is currently available in NCCR*Explorer (https://nccrexplorer.ccdi.cancer.gov/), an interactive, Web-based application for childhood cancer statistics. Future tools include a NCCR version of SEER*Stat and an NCI cloud-based data platform.
For more information on the CCDI, visit https://www.cancer.gov/research/areas/childhood/childhood-cancer-data-initiative. To subscribe to NCCR updates from NAACCR, please contact Fernanda Silva Michels, MSc, PhD, CTR (fmichels@naaccr.org).
Footnotes
Funding for this project was made possible in part by a contract with federal funds from the National Cancer Institute, National Institutes of Health, and the United States Department of Health and Human Services under contract number 75N91021D00018 / 75N91022F00001.
References
- 1.SEER*Stat Database: NAACCR Incidence Data - CiNA Research Data, 2019, Public Use (20 Age Groups) (which includes data from CDC's National Program of Cancer Registries (NPCR), CCR's Provincial and Territorial Registries, and the NCI's Surveillance, Epidemiology and End Results (SEER) Registries), certified by the North American Association of Central Cancer Registries (NAACCR) as meeting high-quality incidence data standards for the specified time periods, submitted December 2022. [Google Scholar]
- 2.Ad Hoc Working Group in Support of the Childhood Cancer Data Initiative of the National Cancer Institute Board of Scientific Advisors. Data Sharing Opportunities in Childhood, Adolescent and Young Adult (AYA) Cancer Research for the National Cancer Institute. National Institutes of Health; 2020. https://deainfo.nci.nih.gov/advisory/bsa/sub-cmte/CCDI/CCDI%20BSA%20WG%20Report_Final%20061620.pdf [Google Scholar]
- 3.Turcotte LM, Whitton JA, Friedman DL, et al. Risk of subsequent neoplasms during the fifth and sixth decades of life in the childhood cancer survivor study cohort. J Clin Oncol. 2015;33(31):3568-3575. [DOI] [PMC free article] [PubMed] [Google Scholar]
