Skip to main content
JCO Clinical Cancer Informatics logoLink to JCO Clinical Cancer Informatics
. 2022 Jan 5;6:e2100152. doi: 10.1200/CCI.21.00152

Precision Medicine Landscape of Genomic Testing for Patients With Cancer in the National Institutes of Health All of Us Database Using Informatics Approaches

Jay G Ronquillo 1,2,, William T Lester 3,4
PMCID: PMC9848602  PMID: 34985965

PURPOSE

The rapid growth of biomedical data ecosystems has catalyzed research for oncology and precision medicine. We leverage federal cloud-based precision medicine databases and tools to better understand the current landscape of precision medicine and genomic testing for patients with cancer.

METHODS

Retrospective observational study of genomic testing for patients with cancer in the National Institutes of Health All of Us Research Program, with the cancer cohort defined as having at least two documented or reported cancer diagnoses.

RESULTS

There were 5,678 (1.8%) All of Us participants in the cancer cohort, with a significant difference between cancer status by age category, sex, race, and ethnicity (P < .001 for all). There were 295 (5.2%) patients with cancer who received genomic testing compared with 6,734 (2.2%) of noncancer patients, with 752 genomic tests commonly focused on gene mutations (primarily pharmacogenomics), molecular pathology, or clinical cytogenetic reports.

CONCLUSION

Although not yet ubiquitous, diverse clinical genomic analyses in oncology can set the stage to grow the practice of precision medicine by integrating research patient data repositories, cancer data ecosystems, and biomedical informatics.

INTRODUCTION

The National Institutes of Health All of Us (AoU) Research Program is a nationwide initiative collecting vast biomedical and real-world data from a diverse population of patients to accelerate the research and practice of precision medicine.1,2 Similarly, the National Cancer Institute (NCI) Cancer Research Data Commons (CRDC) is a cancer data ecosystem focused on catalyzing scalable cancer research and precision oncology.3 Although these initiatives represent a growing focus on big data technologies for innovative, data-driven biomedical research, their practical value is tied to helping stakeholders better understand current precision oncology practices nationwide.4 However, most current precision medicine studies only assess genomic testing at the institutional level or focus on narrow clinical indications (eg, pharmacogenomics), a handful of genes (eg, BRCA and EGFR), or limited cancer types.5-8 Through NCI cloud computing resources and data from the AoU Research Program, we unify these goals at the national level to study the current landscape of precision medicine through all forms of genomic testing across all reported cancers.

CONTEXT

  • Key Objective

  • What is the current practical landscape of precision medicine and genomic testing for patients with cancer?

  • Knowledge Generated

  • The National Institutes of Health All of Us Research Program, combined with the National Cancer Institute Cancer Research Data Commons, can serve as an important resource to better understand the diversity of patients with cancer and genomic testing practices across the United States.

  • Relevance

  • Robust clinical informatics tools are needed to integrate and harmonize diverse genomic and health care data sets, translate findings into responsible recommendations, and communicate insights effectively to both patients and their providers.

METHODS

Research Patient Data Repository

The AoU Curated Data Repository (CDR) is a cloud-based research patient data repository composed of diverse datatypes collected from participants nationwide.9 As described in prior studies, participants complete surveys and allow coded clinical data from their electronic health record (EHR) to be aggregated into the AoU CDR.9 The CDR has been standardized and harmonized to the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership Common Data Model, and available to approved researchers as Registered Tier data through the cloud-based AoU Researcher Workbench.9 This study focuses primarily on patient EHR and survey data integrated into the December 2020 (R2020Q4R3) release of the AoU CDR.

Cancer Cohort Identification

The NCI SEER Site Modules provide a practical categorization of common cancer sites along with relevant mappings to International Classification of Diseases for Oncology, 3rd Edition codes.10 Using the SEER mappings (along with skin cancer focused on melanoma), we identified corresponding International Classification of Diseases, 10th Revision (ICD10), codes along with equivalent International Classification of Diseases, 9th Revision (ICD9), codes on the basis of the 2018 General Equivalence Mappings from the Centers for Medicare and Medicaid Services.11 AoU participants also have the option of completing a Personal Medical History survey to indicate if they were diagnosed with cancers in 21 different categories.12 All participants in the AoU database with at least two separate instances of a documented or reported cancer diagnosis (eg, two cancer ICD codes or one ICD code and one selection in the Personal Medical History survey) were included in the cancer cohort, with diagnoses mapped to appropriate SEER categories for analysis. Any patient with cancer who could not be mapped to a SEER category was placed in the other cancer category, whereas participants not meeting the criteria described above were classified as noncancer patients.

Defining and Characterizing Genomic Testing

Using published genetic interoperability guidance, we identified relevant laboratory tests in the Logical Observation Identifiers Names and Codes (LOINC) vocabulary (version 2.69) to assess genetic and genomic information in the AoU database.13 This included identifying and reviewing LOINC Laboratory Term Classes and removing classes (and tests) not related to genetic information or involving nonhuman genetic mutations (eg, tests for genetic mutations of drug-resistant bacteria).13,14 The final list of genomic testing included laboratory tests and codes from the following LOINC classes: clinical cytogenetic report (HL7.CYTOGEN), clinical genetic report (HL7.GENETICS), molecular pathology (MOLPATH), gene deletions or duplications (MOLPATH.DELDUP and MOLPATH.DEL), gene inversion (MOLPATH.INV), gene miscellaneous (MOLPATH.MISC), gene mutation (MOLPATH.MUT), nucleotide repeats (MOLPATH.NUCREPEAT), pharmacogenomics (MOLPATH.PHARMG), gene rearrangement (MOLPATH.REARRANGE), gene chromosome trisomy (MOLPATH.TRISOMY), gene translocation (MOLPATH.TRNLOC), HL7 cytogenetics panel (PANEL.HL7.CYTOGEN), HL7 genetics panel (PANEL.HL7.GENETICS), pharmacogenomics panel (PANEL.MOLPATH.PHARMG), molecular pathology panel (PANEL.MOLPATH), and nucleotide repeats (MOLPATH.TRINUC).

In addition to class categories, we investigated the characteristics of coded results returned for common genomic tests (via the value_source_value field in the AoU MEASUREMENT table). The findings were harmonized by looking up corresponding answer concepts in the LOINC database or Athena-OHDSI Vocabularies Repository.15 Answer terms suggesting a likely abnormal genetic result (eg, positive, mutation detected, abnormal, etc) were classified as a positive finding, whereas terms suggesting a likely normal genetic result (eg, negative, mutation absent, normal, etc) were classified as a negative finding.15 All answers without a clear positive or clear negative finding were classified as other and/or could not be determined in analyses.

Informatics Pipeline and Statistical Analysis

As shown in Figure 1, all definitions and mappings for cancer and genomic test concepts were created in the NCI Cancer Research Data Commons Cloud Resources using a cloud-based Jupyter Notebook running R (version 4.0.3), whereas all data extraction, integration, and analysis was performed in the AoU Researcher Workbench using a cloud-based Jupyter Notebook running Python (version 3.7).3 The AoU Institutional Review Board established that the study data met criteria for nonhuman subjects research, which does not require institutional review board review. The results were reported in compliance with the AoU Data and Statistics Dissemination Policy disallowing disclosure of group counts under 20.

FIG 1.

FIG 1.

Cloud-based biomedical informatics pipeline to extract and analyze data for AoU patients with cancer. AoU, All of Us; CRDC, Cancer Research Data Commons; ICD, International Classification of Diseases; LOINC, Logical Observation Identifiers Names and Codes; NCI, National Cancer Institute; NIH, National Institutes of Health.

Summary statistics were calculated for participant demographics, cancer types, and genomic testing. Differences in categorical variables (cancer status v age group, sex, race, or ethnicity) were evaluated using the chi-square test. A P value < .05 was considered significant.

RESULTS

General Patient Characteristics

Out of 315,297 AoU participants (mean age of 53.0 ± 16.8 years, median 54.7 years, interquartile range 38.8-66.4 years), there were 5,678 (1.8%) with at least two documented or reported cancer diagnoses, with demographics summarized in Table 1. There was a significant difference between patients with and without a cancer diagnosis by age category, sex, race, and ethnicity (P < .001 for all).

TABLE 1.

Characteristics of All of Us Participants by Cancer Status

graphic file with name cci-6-e2100152-g002.jpg

The most commonly reported cancer categories included breast with 2089 (36.8%) patients, prostate with 998 (17.6%), leukemia or lymphoma with 749 (13.2%), melanoma with 746 (13.1%), head and neck with 669 (11.8%), colorectal with 405 (7.1%), and cervical or uterine with 397 (7.0%) patients (Fig 2).

FIG 2.

FIG 2.

Distribution of cancer sites for participants in the All of Us Research Program (not mutually exclusive).

Current State of Genomic Testing for Patients With Cancer

Overall, 7,029 (2.2%) AoU participants received some type of genomic testing. By cancer status, 295 (5.2%) of patients with cancer received genomic testing compared with 6,734 (2.2%) of noncancer patients. The most common genomic testing categories by number of tested patients are summarized in Table 2.

TABLE 2.

Genomic Testing Breakdown of All of Us Participants by Cancer Status and Testing Category (not mutually exclusive)

graphic file with name cci-6-e2100152-g004.jpg

There were a total of 34,973 genomic tests performed in the AoU population, with 752 (2.2%) performed in patients with cancer and 34,221(97.8%) in noncancer patients. For patients with cancer, the most common categories by test volume included gene mutation with 466 (62.0%) tests, gene translocation with 122 (16.2%), molecular pathology with 101 (13.4%), and clinical cytogenetic report with 37 (4.9%) tests.

For patients with cancer, specific genetic tests (ie, genomic tests in the gene mutation category) evaluated approximately 27 different genes, with the most common tests including the following: hemoglobin subunit beta gene with 274 (58.8%) tests, coagulation factor V (F5) with 24 (5.2%), coagulation factor II (F2; c.20210G>A) with < 20 (< 4.3%), Janus kinase 2 (JAK2; p.Val617Phe) with < 20 (< 4.3%), cancer-focused next-generation sequencing with < 20 (< 4.3%), and methylenetetrahydrofolate reductase (MTHFR; c.677C>T) with < 20 (< 4.3%) tests.

Characteristics of Genomic Test Results for Patients With Cancer

There were 19,043 tests (54.5%) with structured and clinically usable (ie, normal or abnormal) genomic findings, with patients with cancer making up 22 (0.1%) of these tests and noncancer patients making up 19,021 (99.9%). Positive findings commonly contained the following values: positive, carrier, detected, abnormal, homozygous, or heterozygous. Negative findings commonly had the following values: negative, not detected, normal result, low risk, no aneuploidy, low probability, normal, or wild-type.

DISCUSSION

Just over a decade after the Health Information Technology for Economic and Clinical Health Act accelerated nationwide adoption of EHRs, research patient data repositories like AoU have begun integrating these data for scalable precision medicine research.1,16 This study leverages diverse clinical, laboratory, and patient-reported data to better understand the current precision medicine landscape of genomic testing for a large cohort of patients with cancer.

AoU participants were affected by numerous cancer types, with the most common sites (breast and prostate) consistent with 2018 prevalence estimates from the US Centers for Disease Control and Prevention, whereas other substantial subpopulations (melanoma, leukemia/lymphoma, etc) could provide the foundation for future data-driven studies.17 The AoU program has also made intentional efforts to recruit populations under-represented in biomedical research; yet, although this may be evident in larger numbers of female participants, the current cancer cohort still mirrors historically homogeneous cancer research populations with very little racial and ethnic diversity.18,19 This remains a challenge since racial and ethnic minority communities are both disproportionately affected by cancer and strongly influenced by cancer-related stigma, caught in a cycle of reduced engagement in activities (eg, screening, seeking care, and enrolling in clinical research) that could improve outcomes at the individual and population levels.20-22 Dedicated efforts to increase the diversity of participants across multiple dimensions will be critical to making research findings applicable to precision medicine and generalizable to the broader population.18

Clinical genomic analyses were more commonly used in oncology, with a larger fraction of patients with cancer receiving genomic testing (5.2% v 2.2% noncancer).4 In addition, 6 (10.2%) genes tested in patients with cancer were part of the 59 medically actionable genes identified by the American College of Medical Genetics and Genomics.23 Oncologists also seem to be using genetic testing to better understand relevant diseases or treatments (eg, JAK2 for myeloproliferative neoplasms, MTHFR for drugs like methotrexate), as well as for broader assessments through cancer-focused next-generation sequencing.24-26 However, most single gene tests currently appear to be pharmacogenomic biomarkers identified in drug labeling by the US Food and Drug Administration (FDA).27 Hemoglobin subunit beta, eg, is an FDA-listed biomarker for hematology drugs that treat sickle cell disease (voxelotor and crizanlizumab-tmca), beta thalassemia (luspatercept–aamt), or myelodysplastic syndrome (luspatercept–aamt); whereas F2 and F5 are biomarkers for drugs in hematology (avatrombopag and lusutrombopag) and oncology (tamoxifen).27,28 Because this study reflects the clinical data broadly available in EHRs, it provides practical insights into the continuum of both cancer and noncancer care where precision medicine is being leveraged for oncology patients.

Research patient data repositories (like AoU) require a strong understanding of the underlying data sources to build effective pipelines that extract cancer and genomic information. For example, medical vocabulary–based phenotyping can provide robust, reusable, and automated identification of cancer types and related clinical outcomes.29-32 However, missing data can be a challenge, especially for care received outside participant health care provider organizations or for external laboratory results not documented (or sufficiently structured) in the EHR.9,33 By contrast, patient-reported data from surveys provide up-to-date targeted cancer and genetic information, but are vulnerable to low response rates, misclassification, recall bias, or lack of clinical granularity.9,34 Hybrid approaches, like the one used in this study for cancer cohort building, can leverage the strengths of multiple data sources while minimizing their individual limitations. Going forward, practical informatics innovations will be critical to the growing number of precision oncology studies using large real-world databases of unprecedented clinical, scientific, and technical diversity.32,35-37

Our study also provides a snapshot of the informatics challenges preventing broader adoption of precision medicine, with just half of the available genomic test results in a structured and practical format that clinicians can use at the point of care. Many genomic tests are performed offsite by external laboratories and companies, with results returned in highly variable ways: lengthy scanned documents, raw findings pasted in different sections of the EHR, and even reports accessible through external portals (and not part of the EHR at all).8,38 Robust clinical informatics solutions are needed to integrate and harmonize diverse genomic and health care data sets, translate findings into responsible recommendations, and communicate insights effectively to both patients and their providers. The AoU database is capable of serving as an important resource to understand, build, and evaluate precision medicine informatics tools for the cancer community.

Our study had several limitations. First, the analysis may not be generalizable until the AoU program reaches its goal of recruiting one million or more diverse participants across the United States over the next several years. Second, selection bias is possible, since patients with cancer with metastatic disease or multiple comorbidities may not be willing, able, or available to enroll in the study. Third, missing data (as discussed above) are a known issue, and cancers not documented in the EHR or itemized in the medical history survey may be under-represented. Fourth, there is no pediatric population since AoU enrollment is currently limited to adults age 18 years and older, although pediatric recruitment is planned for the future.1 Finally, current genomic analyses focus on available structured findings in the EHR, but more comprehensive insights could come from the planned release of AoU whole-genome sequencing data and potential integration with NCI CRDC bioinformatics pipelines.1 Despite these limitations, our study provides practical insights into the current landscape of precision medicine and precision oncology using real-world data from the AoU Research Program.

In conclusion, although not yet ubiquitous, diverse clinical genomic analyses in oncology can set the stage to grow the practice of precision medicine by integrating research patient data repositories, cancer data ecosystems, and biomedical informatics.

Jay G. Ronquillo

Employment: Syapse

Uncompensated Relationships: grinformatics

William T. Lester

Stock and Other Ownership Interests: Prompt Health

No other potential conflicts of interest were reported.

DISCLAIMER

The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication. Dr Ronquillo reported being the developer and owner (via company Grinformatics) of several precision medicine mobile health iPhone apps on the App Store that are unrelated to the current work. No other disclosures were reported.

PRIOR PRESENTATION

Presented at the virtual meeting of the 2021 Global Observational Health Data Sciences and Informatics (OHDSI) Symposium, September 12-15, 2021 (poster).

SUPPORT

Supported by general cloud credits via the National Institutes of Health (NIH) Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative; and the National Cancer Institute (NCI) Cancer Research Data Commons Cloud Resources.

The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. In addition, the All of Us Research Program would not be possible without the partnership of its participants.

AUTHOR CONTRIBUTIONS

Conception and design: Jay G. Ronquillo

Collection and assembly of data: Jay G. Ronquillo

Data analysis and interpretation: All authors

Manuscript writing: All authors

Final approval of manuscript: All authors

Accountable for all aspects of the work: All authors

AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Jay G. Ronquillo

Employment: Syapse

Uncompensated Relationships: grinformatics

William T. Lester

Stock and Other Ownership Interests: Prompt Health

No other potential conflicts of interest were reported.

REFERENCES


Articles from JCO Clinical Cancer Informatics are provided here courtesy of American Society of Clinical Oncology

RESOURCES