Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Oct 3.
Published in final edited form as: Cancer Epidemiol Biomarkers Prev. 2024 Apr 3;33(4):547–556. doi: 10.1158/1055-9965.EPI-23-1200

Risk of Gastric Adenocarcinoma in a Multiethnic Population undergoing Routine Care: an Electronic Health Records Cohort Study

Robert J Huang 1, Edward S Huang 2, Satish Mudiganti 3, Tony Chen 3, Meghan C Martinez 3, Sanjay Ramrakhiani 2, Summer S Han 4, Joo Ha Hwang 1, Latha P Palaniappan 5, Su-Ying Liang 3
PMCID: PMC10990787  NIHMSID: NIHMS1961542  PMID: 38231023

Abstract

Background:

Gastric adenocarcinoma (GAC) is often diagnosed at advanced stages and portends a poor prognosis. We hypothesized that electronic health records (EHR) could be leveraged to identify individuals at highest risk for GAC from the population seeking routine care.

Methods:

This was a retrospective cohort study, with endpoint of GAC incidence as ascertained through linkage to an institutional tumor registry. We utilized 2010–2020 data from the Palo Alto Medical Foundation, a large multispecialty practice serving Northern California. The analytic cohort comprised individuals aged 40–75 receiving regular ambulatory care. Variables collected included demographic, medical, pharmaceutical, social, and familial data. Electronic phenotyping was based on rules-based methods.

Results:

The cohort comprised 316,044 individuals and ~2 million person-years (p-y) of observation. 157 incident GACs occurred (incidence 7.9 per 100,000 p-y), of which 102 were non-cardia GACs (incidence 5.1 per 100,000 p-y). In multivariable analysis, male sex (HR 2.2, 95% CI 1.6–3.1), older age, Asian race (HR 2.5, 95% CI 1.7–3.7), Hispanic ethnicity (HR 1.9, 95% CI 1.1–3.3), atrophic gastritis (HR 4.6, 95% CI 2.2–9.3), and anemia (HR 1.9, 95% CI 1.3–2.6) were associated with GAC risk; use of non-steroidal anti-inflammatory drug was inversely associated (HR 0.3, 95% CI 0.2–0.5). Older age, Asian race, Hispanic ethnicity, atrophic gastritis, and anemia were associated with non-cardia GAC.

Conclusions:

Routine EHR data can stratify the general population for GAC risk.

Impact:

Such methods may help triage populations for targeted screening efforts, such as upper endoscopy.

INTRODUCTION

Improving early cancer diagnosis is a key to improving outcomes and reducing mortality. Gastric adenocarcinoma (GAC) is one of the most common cancers worldwide, and a leading cause of global cancer mortality.(1) Early detection of GAC is particularly appealing, as survival is closely associated with stage of diagnosis. Based on data from the United States (US) Surveillance, Epidemiology, and End Results program, five-year survival from cancers diagnosed at localized, regional, and distant stages were ~50%, ~25%, and <5%, respectively. Even more strikingly, when GAC can be discovered at an early stage (defined as tumors with invasion no deeper than the submucosa), survival can exceed 95% following resection (either endoscopic or surgical).(24) Early-stage GAC diagnosis may seem improbable at first glance, given the absence of symptoms. Yet in regions of the world with structured screening (such as South Korea and Japan), early-stage diagnosis is now common within screening-aged populations.(5,6)

Electronic health records (EHRs) represent a real-world, continuous stream of data comprising both ‘structured’ data elements consisting of standardized data within pre-defined categories, as well as free text. There has been much recent interest in the use of EHR data to construct epidemiologic cohorts. Traditional survey- or examination-based cohorts suffer from limitations such as suboptimal follow-up rate and time, labor-intensive data collection, and a priori specification of focus.(7) EHR-based cohorts offer larger sample sizes, inclusion of diverse populations, reduced opportunity for recall bias or selection bias, increased cost efficiency, and the possibility of rapid translation into real-world clinical practice (such as through clinical decision support tools). At the same time, EHR-based data suffer from unique limitations, including incompleteness of data, low data quality (e.g. variability in data entry practices between providers and facilities), selection bias (EHRs capture data of those seeking health care), limited data elements (e.g. dietary history is not routinely asked during clinical care), and privacy concerns.

Longitudinal EHR-based risk data for GAC are limited, as most prior EHR-based studies have relied on a case-control design.(810) There is a need for absolute risk models to estimate cancer incidence within a given time interval based on known risk factors in a general population. These estimates can be used to identify individuals at heightened risk within the healthcare system who can be targeted for screening, or guide risk mitigation efforts (such as smoking cessation counseling). EHR-based risk stratification is particularly appealing in the US, given the relative rarity of GAC (US incidence is ~5–6 per 100,000).(11) EHR-based risk stratification could potentially serve as a first-line precision health tool to triage patients toward more comprehensive prevention efforts. Certain racial (e.g. Asians) and ethnic (e.g. Hispanics) groups are at increased risk for GAC within the US;(12) as such, building EHR models within multiracial/multiethnic EHR cohorts is essential.

In this manuscript, we use tumor registry data linked to longitudinal EHR data from a large, multi-specialty practice in Northern California to provide risk estimates for GAC within a multiethnic general population seeking routine care. We further demonstrate the ability of commonly-captured EHR elements to stratify the screening-age population (defined as 40–75)(6) for incident GAC risk. GACs can be classified as cardia or non-cardia based on tumor location within the stomach. As cardia and non-cardia GACs have different risk factors and disease patterns (with cardia GACs sharing commonalities with esophageal adenocarcinoma),(1315) we performed a sensitivity analysis excluding cardia GACs from the outcome measure.

MATERIALS AND METHODS

Data Sources

The Palo Alto Medical Foundation (PAMF) is part of Sutter Health (Sacramento, CA), and is a multi-specialty practice of over 1,200 physicians serving more than 1 million patients annually in Northern California. The PAMF service region is centered on the San Francisco Bay Area, an ethnically-diverse metropolitan region (35% non-Hispanic White, 5% Black, 24% Asian, 29% Hispanic) with a high proportion of foreign-born (30%) individuals.(16) PAMF was one of the earliest medical groups in the US to adopt EHR-based records, with some form of EHR operating since 2002.(17) PAMF uses an instance of Epic Systems software (Epic Systems Corporation) as its EHR. This study was approved by the Sutter Health Institutional Review Board and was conducted according to Health Insurance Portability and Accountability Act standards. This study was conducted in accordance with the Declaration of Helsinki. As this research was a retrospective study involving hundreds of thousands of individuals, written consent from individuals was not obtained.

Cohort Creation and Outcome Measurement

A flow diagram overviewing the study design is depicted in Supplemental Figure S1. All EHR-documented encounters within pre-specified encounter categories (Supplemental Table S1) between January 1, 2010 and December 31, 2020 were eligible for information capture (N=38,277,045). These categories incorporated a broad array of patient contact venues with the healthcare system, including office visits, urgent care and emergency department visits, inpatient encounters, video visits, skilled nursing visits, and ambulatory surgery visits. From these eligible encounters, all unique individuals who had at least two encounters separated by at least 12 months were included for analysis (N=1,205,145 individuals). Individuals younger than 40 or older than 75 at time of first encounter date were subsequently excluded. We excluded those who had a gap in encounter dates of greater than 24 months within the study period, as well as individuals who had a recorded GAC diagnosis code but who did not appear in the PAMF tumor registry. Our final analytic cohort composed of N=316,044 individuals.

To ascertain the outcome (GAC), we performed linkage of the EHR cohort with the PAMF tumor registry. The PAMF tumor registry maintains data on all the patients diagnosed and treated for cancer within the PAMF system. Tumor registrars are highly trained experts who collect, validate, and process cancer data and store it in the tumor registry. The information collected include tumor characteristics, stages of disease, treatment and outcomes. To identify incident GACs, we queried the tumor registry using International Classification of Diseases, Ninth Revision (ICD-9) codes (151.0, 151.1, 151.2, 151.3, 151.4, 151.5, 151.6, 151.8, 151.9) and International Classification of Diseases, Tenth Revision (ICD-10) codes (C160, C161, C162, C163, C164, C165, C166, C168, C169) for GAC diagnosed between January 1, 2010 and December 31, 2020. This resulted in 239 GAC cases. The case identifiers were then linked to the cohort, resulting in 157 cases occurring within the cohort. The time to event was defined as the number of days between entry into the cohort and either 1) date of GAC diagnosis or 2) last recorded encounter date within the study period (censor time).

Baseline Characteristics and Phenotyping

Baseline demographic characteristics captured from the EHR included age at entry (grouped into categories of 40–49, 50–64, and 65–75), sex, race (Asian, Black, White, Other, No answer, or unknown), and ethnicity (Hispanic, Non-Hispanic, or unknown). For the purposes of risk factor assessment, we selected six established risk factors for GAC for comprehensive phenotyping:(6,9,15,18,19) history of infection by Helicobacter pylori (Hp), presence of atrophic gastritis (an inflammatory condition of the gastric mucosa often caused by Hp infection), anemia, a family history of GAC, former or current smoking, and body mass index. For each of these conditions, phenotyping information was captured from multiple data sources, including encounters, medical history, and billing information. Hp status was determined through one of three mechanisms: 1) the presence of an ICD-9 code (041.86) or ICD-10 code (B96.81) for infection, 2) the dispensation (filled prescription) of any combination of recognized Hp therapies through rules-based consensus (Supplemental Table S2),(20) or 3) a Logical Observation Identifier Names and Codes (LOINC) code for a serologic test (5176–3, 7903–8, 7901–2, 6420–4), breath test (29891–9), or stool antigen test (17780–8) with a positive value. The presence of atrophic gastritis was determined through ICD-9 (535.10, 535.11) or ICD-10 (K29.40, K29.41) codes. The presence of anemia was determined through ICD-9 (280–285) or ICD-10 (D50-D64) codes.

Smoking history was captured through a structured data entry element in the EHR, with 11 allowable elements (Supplemental Table S3)—these elements were then recoded into one of four categories: current smoker, former smoker, never smoker, and unknown smoking status. Family history of GAC was captured through a structured EHR data element comprising 56 allowable elements (Supplemental Table S4) which were recoded into either first- or second-degree family history of GAC. Body mass index was available in ~95% of individuals, and categorized as underweight (<18), healthy weight (1825), overweight (2530), and obese (>30).

For medications, we analyzed dispensation history of four commonly prescribed medicines which have demonstrated association (either positive or negative) with GAC risk:(6,15) aspirin, non-steroidal anti-inflammatory drugs (NSAIDs), proton-pump inhibitors (PPIs), and selective histamine type 2 receptor antagonists/blockers (H2 blockers). Medication combinations (e.g. ibuprofen-acetaminophen) were included for analysis.

Statistical Analysis

Data extraction and statistical analyses were performed in SQL server management studio 18.7.1, and SAS Enterprise Guide 7.1 (Cary, N.C.). All variables were treated as binary or categorical for analysis. The person-years at risk, incident GAC cases, and incident rate per 100,000 person-years (p-y) was calculated within each category. Both cumulative incidence and cumulative hazard was calculated. Cumulative hazards plots for each risk factor were generated and compared between categories by the log-rank test. The PHREG procedure in SAS was used to calculate univariable and multivariable Cox proportional hazard ratios.

Sensitivity analysis: non-cardia GAC

Cardia GACs have unique risk factors compared to non-cardia GACs, and may be more similar in epidemiology to esophageal adenocarcinoma; conversely, non-cardia GACs are more closely associated with Hp infection and immigrant status.(1315) We considered the sensitivity of our analyses to the inclusion or exclusion of cardia GACs (ICD-9 code 151.0, ICD-10 code C16.0) for both incidence calculation and Cox regression. From the PAMF tumor registry, we identified 57 cardia GACs, leaving 102 non-cardia GACs in the sensitivity analyses. As a second-level sensitivity analysis, we also excluded GACs with code C16.8 (overlapping sites) and C16.9 (unspecified), as these two codes could potentially include cardia GACs. Applying these filtres resulted in 69 GACs available for analysis.

Data Availability Statement

This cohort was created using real-world clinical data from living individuals. Data cannot be shared publicly due to patient confidentiality. The de-identified data underlying the results presented in the study can be requested by contacting the corresponding author for researchers who meet the criteria for access to confidential data.

Results

Cohort Characteristics

The cohort comprised of 316,044 individuals, and 1,999,806 p-y of observational data. The median follow-up time for the entire cohort was 76 months. Cohort characteristics, p-y of observation, and case count by demographic and clinical characteristics are depicted in Table 1 (with case counts of <10 suppressed). Our screening-aged (40–75) cohort was broadly representative of the PAMF catchment area, and contained a high proportion of Asians (20.4%) and Hispanics (7.9%). Hp was previously diagnosed in 3.5% of the cohort, 1.3% through dispensation of Hp antibiotic therapy, 2.7% through ICD-10 coding, and 1.1% through positive laboratory results. A small fraction of the cohort carried a known diagnosis of atrophic gastritis (0.4%). Anemia was present in about 17.6% of the cohort. With regards to family history of GAC (EHR structured data element), 0.4% had a first-degree relative and 0.2% a second-degree relative with GAC, respectively. Current smokers constituted 7.3%, former smokers 22.7%, and never smokers 67.8% of the cohort, respectively (with 2.2% missing smoking information). Overweight (35.3%) and obesity (34.7%) were quite prevalent in the cohort. Prescription aspirin was used by 2.9% (notably, non-prescription aspirin was not captured), prescription NSAIDs by 27.1%, prescription PPI by 17.6%, and prescription H2 blockers by 5.3% of the cohort, respectively.

Table 1:

GAC Incidence (including cardia GAC)

Characteristic (%) No. (%) p-y at risk GAC Cases (%) Incidence per 100,000 p-y (95% CI)

Entire Cohort 316,044 1,999,806 157 7.9 (6.7–9.2)

Demographic data

Age
 40–49 115,515 (36.6%) 720,917 29 (18.5%) 4.0 (2.7–5.8)
 50–64 137,482 (43.5%) 880,485 76 (48.4%) 8.6 (6.8–10.8)
 65–75 63,047 (19.9%) 398,404 52 (33.1%) 13.1 (9.7–17.1)
Sex
 Male 139,402 (44.1%) 861,491 98 (62.4%) 11.4 (9.2–13.9)
 Female 176,642 (55.9%) 1,138,307 59 (37.6%) 5.2 (3.9–6.7)
Race
 White 180,066 (57.0%) 1,219,311 66 (42.0%) 5.4 (4.2–6.9)
 Asian 64,336 (20.4%) 417,520 52 (33.1%) 12.5 (9.3–16.3)
 Black 5,688 (1.8%) 34,639 <10 -
 Other race 21,217 (6.7%) 117,668 10 (6.4%) 8.5 (4.1–15.6)
 Prefer not to Answer 27,310 (8.6%) 152,263 25 (15.9%) 16.4 (10.6–24.2)
 Unknown 17,427 (5.5%) 58,405 <10 -
Hispanic Ethnicity
 Non-Hispanic 239,439 (75.8%) 1,592,808 119 (75.8%) 7.5 (6.2–8.9)
 Hispanic 24,911 (7.9%) 151,719 22 (14.0%) 14.5 (9.1–22.0)
 Unknown 51,694 (16.4%) 255,279 16 (10.2%) 6.3 (3.6–10.2)

Medical history

Prior Hp (any method) 11,179 (3.5%) 86,557 33 (21.0%) 38.1 (26.2–53.5)
 Prior Hp therapy 4,125 (1.3%) 30,236 <10 -
 ICD code 8,632 (2.7%) 68,525 31 (19.7%) 45.2 (30.7–64.2)
 LOINC code 3,333 (1.1%) 25,356 13 (8.3%) 51.3 (27.3–87.7)
Atrophic gastritis 1,336 (0.4%) 10,660 <10 -
Anemia 55,509 (17.6%) 424,212 60 (38.2%) 14.1 (10.8–18.2)
Family history of GAC (EHR element)
 First-degree 1,329 (0.4%) 10,053 <10 -
 Second-degree 571 (0.2%) 4,326 <10 -
Smoking status (EHR element)
 Never smoker 214,235 (67.8%) 1,370,508 95 (60.5%) 6.9 (5.6–8.5)
 Current smoker 23,225 (7.3%) 138,894 14 (8.9%) 10.1 (5.5–16.9)
 Former smoker 71,695 (22.7%) 474,672 42 (26.8%) 8.8 (6.4–12)
 Unknown 6,889 (2.2%) 15,732 <10 -
Body mass index (EHR element)
 Underweight 1,206 (0.4%) 6,582 <10 -
 Healthy weight 75,699 (24.0%) 482,265 40 (25.5%) 8.3 (5.9–11.3)
 Overweight 111,664 (35.3%) 734,576 53 (33.8%) 7.2 (5.4–9.4)
 Obese 109,589 (34.7%) 727,473 60 (38.2%) 8.2 (6.3–10.6)
 Missing 17,886 (5.7%) 48,909 <10 -

Medication history

Aspirin use 9,165 (2.9%) 64,201 <10 -
NSAID use 85,672 (27.1%) 638,065 20 (12.7%) 3.1 (1.9–4.8)
PPI use 55,743 (17.6%) 417,585 41 (26.1%) 9.8 (7–13.3)
H2 blocker use 16,700 (5.3%) 125,410 10 (6.4%) 8.0 (3.8–14.7)

Table 1: Incidence of gastric adenocarcinoma (GAC), including cardia GACs. Person-years (p-y) at risk, cancer count, and incidence per 100,000 p-y reported by characteristic. GAC counts <10 were suppressed and incidence not reported. EHR, electronic health record; H2 blocker, histamine type 2 receptor blocker; Hp, Helicobacter pylori; ICD, international classification of disease; LOINC, Logical Observation Identifiers, Names and Codes; NSAID, non-steroidal anti-inflammatory drug; PPI, proton pump inhibitor.

GAC Incidence

In the observation period, 157 GAC cases occurred. The overall incidence of GAC was 7.9 (95% CI 6.7–9.2) per 100,000 p-y. Incidence rate stratified by demographic and clinical characteristics are presented in Table 1. GAC incidence was higher in older age groups. Incidence was higher among Asians (12.5, 95% CI 9.3–16.3 per 100,000 p-y) and Hispanics (14.5, 95% CI 9.1–22.0 per 100,000 p-y) compared to White (5.4, 95% CI 4.2–6.9 per 100,000 p-y) and non-Hispanic (7.5, 95% CI 6.2–8.9 per 100,000 p-y) individuals. Individuals with a prior history of Hp infection (38.1, 95% CI 26.2–53.5 per 100,000 p-y) and anemia (14.1, 95% CI 10.8–18.2 per 100,000 p-y) demonstrated higher incidence. Conversely, individuals with a dispensed prescription for NSAIDs demonstrated lower incidence (3.1, 95% CI 1.9–4.8 per 100,000 p-y).

102 non-cardia GACs occurred, and the overall incidence of non-cardia GAC was 5.1 (95% CI 4.2–6.2) per 100,000 p-y (Table 2). Incidence patterns for non-cardia GACs were broadly similar to trends for all GACs. Older individuals demonstrated higher incidence. Asians (10.8, 95% CI 7.9–14.4 per 100,000 p-y) and Hispanics (13.2, 95% CI 8.1–20.4 per 100,000 p-y) demonstrated higher incidence compared to White (2.4, 95% CI 1.6–3.4 per 100,000) and non-Hispanic (4.5, 95% CI 3.5–5.6 per 100,000) individuals. Individuals with a history of Hp infection (34.7, 95% CI 23.4–49.5 per 100,000) and anemia (10.4, 95% CI 7.5–13.9 per 100,000) were at significantly increased risk. Incidence among NSAID users was lower (1.6, 95% CI 0.8–2.9 per 100,000). When codes C16.8 and C16.9 were additionally excluded, 69 GACs were identified. While the overall cancer incidence was lower (as fewer cancers remained after exclusion), the relative patterns of incidence remained similar to the full and non-cardia analyses (Supplemental Table S5).

Table 2:

GAC Incidence (excluding cardia GACs)

Characteristic (%) No. (%) p-y at risk GAC Cases (%) Incidence per 100,000 p-y (95% CI)

Entire Cohort 315,989 1,999,518 102 5.1 (4.2–6.2)

Demographic data

Age
 40–49 115,512 (36.6%) 720,902 26 (25.5%) 3.6 (2.4–5.3)
 50–64 137,451 (43.5%) 880,314 45 (44.1%) 5.1 (3.7–6.8)
 65–75 63,026 (19.9%) 398,303 31 (30.4%) 7.8 (5.3–11.1)
Sex
 Male 139,353 (44.1%) 861,247 51 (50.0%) 5.9 (4.4–7.8)
 Female 176,636 (55.9%) 1,138,264 51 (50.0%) 4.5 (3.3–5.9)
Race
 White 180,029 (57.0%) 1,219,101 29 (28.4%) 2.4 (1.6–3.4)
 Asian 64,329 (20.4%) 417,487 45 (44.1%) 10.8 (7.9–14.4)
 Black 5,688 (1.8%) 34,639 <10 -
 Other race 21,215 (6.7%) 117,663 <10 -
 Prefer not to Answer 27,301 (8.6%) 152,223 16 (15.7%) 10.5 (6–17.1)
 Unknown 17,427 (5.5%) 58,405 <10 -
Hispanic Ethnicity
 Non-Hispanic 239,391 (75.8%) 1,592,561 71 (69.6%) 4.5 (3.5–5.6)
 Hispanic 24,909 (7.9%) 151,704 20 (19.6%) 13.2 (8.1–20.4)
 Unknown 51,689 (16.4%) 255,253 11 (10.8%) 4.3 (2.2–7.7)

Medical history

Prior Hp (any method) 11,176 (3.5%) 86,543 30 (29.4%) 34.7 (23.4–49.5)
 Prior Hp therapy 4,124 (1.3%) 30,230 <10 -
 ICD code 8,629 (2.7%) 68,511 28 (27.5%) 40.9 (27.2–59.1)
 LOINC code 3,332 (1.1%) 25,351 12 (11.8%) 47.3 (24.5–82.7)
Atrophic gastritis 1,336 (0.4%) 10,660 <10 -
Anemia 55,493 (17.6%) 424,113 44 (43.1%) 10.4 (7.5–13.9)
Family history of GAC (EHR element)
 First-degree 1,328 (0.4%) 10,052 <10 -
 Second-degree 571 (0.2%) 4,326 <10 -
Smoking status (EHR element)
 Never smoker 214,209 (67.8%) 1,370,364 69 (67.6%) 5.0 (3.9–6.4)
 Current smoker 23,216 (7.3%) 138,862 <10 -
 Former smoker 71,678 (22.7%) 474,562 25 (24.5%) 5.3 (3.4–7.8)
 Unknown 6,886 (2.2%) 15,731 <10 -
Body mass index (EHR element)
 Underweight 1,206 (0.4%) 6,582 <10 -
 Healthy weight 75,693 (24.0%) 482,255 34 (33.3%) 7.1 (4.9–9.9)
 Overweight 111,644 (35.3%) 734,455 33 (32.4%) 4.5 (3.1–6.3)
 Obese 109,563 (34.7%) 727,326 34 (33.3%) 4.7 (3.2–6.5)
 Missing 17,883 (5.7%) 48,900 <10 -

Medication history

Aspirin use 9,163 (2.9%) 64,188 <10 -
NSAID use 85,662 (27.1%) 637,983 10 (9.8%) 1.6 (0.8–2.9)
PPI use 55,728 (17.6%) 417,476 26 (25.5%) 6.2 (4.1–9.1)
H2 blocker use 16,696 (5.3%) 125,373 <10 -

Table 2: Incidence of gastric adenocarcinomas (GAC), excluding cardia GACs. Person-years (p-y) at risk, cancer count, and incidence per 100,000 p-y reported by characteristic. GAC counts <10 were suppressed and incidence not reported. EHR, electronic health record; H2 blocker, histamine type 2 receptor blocker; Hp, Helicobacter pylori; ICD, international classification of disease; LOINC, Logical Observation Identifiers, Names and Codes; NSAID, non-steroidal anti-inflammatory drug; PPI, proton pump inhibitor.

Cumulative Hazard

The cumulative hazard functions for GAC in the entire cohort, and stratified by select characteristics, are depicted in Figure 1 (with the corresponding numbers at risk in Supplemental Table S6). GAC risk was greatest in the oldest age group (65–75), followed by the middle age group (50–64), and lowest in the youngest group (40–49) (log-rank p<0.001). GAC risk was higher among males (p<0.001). GAC was also higher among Hispanics and non-Hispanic Asians compared to non-Hispanic Whites (p<0.001). Individuals with a personal history of Hp infection (p<0.001) or atrophic gastritis (p<0.001) also had higher risk for incident GAC.

Figure 1:

Figure 1:

Cumulative hazards plots of gastric adenocarcinoma (including cardia) in entire cohort (panel A), and stratified by age (panel B), sex (panel C), race/ethnicity (panel D), prior Helicobacter pylori diagnosis (panel E), and prior diagnosis of atrophic gastritis (panel F). Log-rank p-values presented in each panel. Corresponding persons at risk for each panel can be found in Supplemental Table S5.

Cumulative hazard plots for non-cardia GAC, overall and stratified by select characteristics, are depicted in Figure 2 (with corresponding numbers at risk in Supplemental Table S7). Similar trends emerged for age (older individuals at increased risk), race/ethnicity (Asians and Hispanics at increased risk), prior Hp infection, and atrophic gastritis. However, males did not significantly differ from females in cumulative risk (p=0.2).

Figure 2:

Figure 2:

Cumulative hazards plots of gastric adenocarcinoma (excluding cardia) in entire cohort (panel A), and stratified by age (panel B), sex (panel C), race/ethnicity (panel D), prior Helicobacter pylori diagnosis (panel E), and prior diagnosis of atrophic gastritis (panel F). Log-rank p-values presented in each panel. Corresponding persons at risk for each panel can be found in Supplemental Table S6.

Cox Regression

GAC risk factors were analyzed in univariable and multivariable proportional hazards regression (Table 3). Older age (both the 50–64 and 65–75 age classes) and male sex were associated with increased GAC risk in both the univariable and multivariable models. Asian race (univariable HR 2.3, 95% CI 1.6–3.3; multivariable HR 2.5, 95% CI 1.7–3.7) and Hispanic ethnicity (univariable HR 2.0, 95% CI 1.2–3.1; multivariable HR 1.9, 95% CI 1.1–3.3) associated with increased GAC risk. Prior Hp diagnosis robustly associated with GAC risk in the univariable model (HR 5.8, 95% CI 4.0–8.5); however, this association attenuated to non-significance in the multivariable model. Atrophic gastritis was a predictor in the univariable model (HR 11.1, 95% CI 5.7–21.8), and attenuated but remained significant in the multivariable model (HR 4.6, 95% CI 2.2–9.3). Anemia was a predictor in both univariable and multivariable models. Family history, smoking status (with the exception of unknown status), and body mass index were not associated with GAC. Prior NSAID use was inversely associated with GAC risk in both univariable and multivariable analyses. Aspirin use, PPI use, and H2 blocker use demonstrated no clear association. For all GACs, the c-statistic for the multivariable Cox regression model is 0.8. However, it is important to note that this model performance may be overestimated due to the rare disease cases in our data.

Table 3:

Cox Regression (including cardia GACs)

Variable Univariable Multivariable
HR (95% CI) p-value HR (95% CI) p-value

Demographic data

Age
 40–49 Ref - Ref. -
 50–64 2.1 (1.4 – 3.3) 0.0005 2.3 (1.5 – 3.6) 0.0001
 65–75 3.2 (2.0 – 5.1) <.0001 3.2 (2 – 5.1) <.0001
Male sex (vs female) 2.2 (1.6 – 3.1) <.0001 2.2 (1.6 – 3.1) <.0001
Race
 White Ref - Ref. -
 Asian 2.3 (1.6 – 3.3) <.0001 2.5 (1.7 – 3.7) <.0001
 Black 1.6 (0.5 – 5.1) 0.42 1.4 (0.5 – 4.6) 0.54
 Other race 1.6 (0.8 – 3.1) 0.16 1.2 (0.6 – 2.5) 0.64
 Prefer not to Answer 3.1 (2 – 4.9) <.0001 3.7 (2.1 – 6.5) <.0001
 Unknown 0.3 (0 – 2.4) 0.27 0.4 (0.1 – 3.4) 0.43
Hispanic Ethnicity
 Non-Hispanic Ref. - Ref. -
 Hispanic 2.0 (1.2 – 3.1) 0.0038 1.9 (1.1 – 3.3) 0.023
 Unknown 0.9 (0.5 – 1.4) 0.55 0.5 (0.3 – 1) 0.053

Medical history

Prior Hp diagnosis 5.8 (4.0 – 8.5) <.0001 1.5 (0.3 – 7.1) 0.58
 Prior Hp therapy 2.6 (1.1 – 5.8) 0.023 0.5 (0.2 – 1.3) 0.17
 ICD code 6.8 (4.6 – 10.1) <.0001 2.6 (0.6 – 11.3) 0.21
 LOINC code 6.9 (3.9 – 12.3) <.0001 2.4 (1.2 – 5.1) 0.017
Atrophic gastritis 11.1 (5.7 – 21.8) <.0001 4.6 (2.2 – 9.3) <.0001
Anemia 2.3 (1.6 – 3.1) <.0001 1.9 (1.3 – 2.6) 0.0002
Family history of GAC (EHR element)
 First-degree 3.8 (1.2 – 11.9) 0.022 3.0 (0.9 – 9.3) 0.063
 Second-degree - - - -
Smoking status (EHR element)
 Never smoker Ref. - Ref. -
 Current smoker 1.5 (0.8 – 2.6) 0.18 1.4 (0.8 – 2.4) 0.28
 Former smoker 1.3 (0.9 – 1.8) 0.20 1.1 (0.7 – 1.6) 0.67
 Unknown 5.7 (2.5 – 13.3) <.0001 10.0 (3.7 – 27.3) <.0001
Body mass index (EHR element)
 Underweight - - - -
 Healthy weight Ref. - Ref. -
 Overweight 0.9 (0.6 – 1.3) 0.50 0.8 (0.5 – 1.2) 0.22
 Obese 1.0 (0.7 – 1.5) 0.97 1.0 (0.7 – 1.6) 0.91
 Missing 1.0 (0.4 – 2.8) 0.99 0.5 (0.1 – 1.7) 0.25

Medication history

Aspirin use 0.4 (0.1 – 1.6) 0.18 0.3 (0.1 – 1.1) 0.064
NSAID use 0.3 (0.2 – 0.5) <.0001 0.3 (0.2 – 0.5) <.0001
PPI use 1.3 (0.9 – 1.9) 0.13 1.0 (0.7 – 1.5) 0.94
H2 blocker use 1.0 (0.5 – 1.9) 0.99 0.8 (0.4 – 1.6) 0.52

Table 3: Univariable and multivariable predictors of gastric adenocarcinoma (GAC), including cardia GACs. Hazard ratios (HRs) with 95% confidence intervals (Cis) are presented in both univariable and multivariable analysis. EHR, electronic health record; H2 blocker, histamine type 2 receptor blocker; Hp, Helicobacter pylori; ICD, international classification of disease; LOINC, Logical Observation Identifiers, Names and Codes; NSAID, non-steroidal anti-inflammatory drug; PPI, proton pump inhibitor.

Many of these predictors of GAC were also associated with non-cardia GAC (Table 4). Of particular interest, Asian race demonstrated a larger effect estimate (univariable HR 4.6, 95% CI 2.9–7.3; multivariable HR 4.3, 95% CI 2.6–7.2) as a predictor for non-cardia GAC (compared to all GACs). Also notably, Black race was significantly associated with non-cardia GAC in the univariable model (HR 3.6, 95% CI 1.1–11.9), and nearly significant in the multivariable model (HR 3.3, 95% CI 1.0–11.0, p=0.052). Hispanic ethnicity also demonstrated a large effect estimate for non-cardia GAC (univariable HR 3.0, 95% CI 1.8–4.9; multivariable HR 3.6, 95% CI 1.9–6.8). Hp diagnosis was strongly associated with non-cardia GAC in univariable analysis, but this mostly attenuated in the multivariable model. Atrophic gastritis and anemia robustly associated with non-cardia GAC in both univariable and multivariable analyses. Family history, smoking status, and body mass index were not associated with non-cardia GAC risk. NSAID was protective against non-cardia GAC risk (univariable HR 0.2, 95% CI 0.1–0.4; multivariable HR 0.2, 95% CI 0.1–0.4). The c-statistic for the multivariable model for non-cardia GACs was 0.9.

Table 4:

Cox Regression (excluding cardia GACs)

Variable Univariable Multivariable
HR (95% CI) p-value HR (95% CI) p-value

Demographic data

Age
 40–49 Ref - Ref. -
 50–64 1.4 (0.9 – 2.3) 0.16 1.8 (1.1 – 2.9) 0.024
 65–75 2.1 (1.3 – 3.6) 0.0041 2.3 (1.3 – 4) 0.0027
Male sex (vs female) 1.3 (0.9 – 2) 0.16 1.5 (1.0 – 2.2) 0.064
Race
 White Ref - Ref. -
 Asian 4.6 (2.9 – 7.3) <.0001 4.3 (2.6 – 7.2) <.0001
 Black 3.6 (1.1 – 11.9) 0.033 3.3 (1.0 – 11.0) 0.052
 Other race 2.9 (1.3 – 6.3) 0.0084 1.4 (0.6 – 3.3) 0.45
 Prefer not to Answer 4.4 (2.4 – 8.2) <.0001 3.8 (1.9 – 7.8) 0.0002
 Unknown 0.7 (0.1 – 5.2) 0.74 0.8 (0.1 – 6.4) 0.84
Hispanic Ethnicity
 Non-Hispanic Ref. - Ref. -
 Hispanic 3.0 (1.8 – 4.9) <.0001 3.6 (1.9 – 6.8) <.0001
 Unknown 1.0 (0.5 – 1.8) 0.90 0.7 (0.4 – 1.6) 0.45

Medical history

Prior Hp diagnosis 9.3 (6.0 – 14.2) <.0001 2.4 (0.5 – 11.2) 0.28
 Prior Hp therapy 3.4 (1.4 – 8.3) 0.0081 0.5 (0.2 – 1.4) 0.17
 ICD code 10.7 (6.9 – 16.6) <.0001 2.2 (0.5 – 9.6) 0.31
 LOINC code 10.4 (5.7 – 19) <.0001 2.5 (1.2 – 5.4) 0.021
Atrophic gastritis 18.1 (9.1 – 35.8) <.0001 6.1 (3.0 – 12.7) <.0001
Anemia 2.8 (1.9 – 4.2) <.0001 2.3 (1.5 – 3.5) <.0001
Family history of GAC (EHR element)
 First-degree 4.0 (1.0 – 16.0) 0.054 2.6 (0.6 – 10.7) 0.18
 Second-degree - - - -
Smoking status (EHR element)
 Never smoker Ref. - Ref. -
 Current smoker 0.7 (0.3 – 1.8) 0.47 0.8 (0.3 – 1.9) 0.58
 Former smoker 1.0 (0.7 – 1.7) 0.85 1.1 (0.7 – 1.8) 0.67
 Unknown 3.6 (1.1 – 11.5) 0.034 10.5 (2.9 – 38.4) 0.0004
Body mass index (EHR element)
 Underweight - - - -
 Healthy weight Ref. - Ref. -
 Overweight 0.6 (0.4 – 1) 0.065 0.7 (0.4 – 1.1) 0.11
 Obese 0.7 (0.4 – 1.1) 0.090 0.8 (0.5 – 1.4) 0.48
 Missing 0.3 (0 – 2) 0.20 0.2 (0 – 1.4) 0.095

Medication history

Aspirin use - - - -
NSAID use 0.2 (0.1 – 0.4) <.0001 0.2 (0.1 – 0.4) <.0001
PPI use 1.3 (0.8 – 2) 0.26 0.9 (0.6 – 1.6) 0.81
H2 blocker use 0.9 (0.4 – 2.1) 0.86 0.7 (0.3 – 1.7) 0.48

Table 4: Univariable and multivariable predictors of gastric adenocarcinoma (GAC), excluding cardia GACs. Hazard ratios (HRs) with 95% confidence intervals (Cis) are presented in both univariable and multivariable analysis. EHR, electronic health record; H2 blocker, histamine type 2 receptor blocker; Hp, Helicobacter pylori; ICD, international classification of disease; LOINC, Logical Observation Identifiers, Names and Codes; NSAID, non-steroidal anti-inflammatory drug; PPI, proton pump inhibitor.

In our additional sensitivity analysis excluding C16.8 and C16.9, we observed similar patterns (Supplemental Table S8). Namely older age, Asian race, and Hispanic ethnicity continued to be predictors of GAC risk. In multivariable analysis, atrophic gastritis and anemia remained robust predictors, while NSAID use remained inversely associated with GAC risk.

Discussion

In this observational study, we created an EHR-based general population cohort from individuals receiving longitudinal care at a large, multispecialty practice serving a multiethnic North American region. Through linkage to an institutional tumor registry, we demonstrated that in a screening-aged population, commonly-captured structured data elements within the EHR (including diagnosis codes, family history, social history, and medication dispensation records) can risk-stratify the general population for subsequent GAC risk. Importantly, this risk stratification could be performed without advanced computational techniques, such as natural language processing (NLP), enhancing the accessibility of these models to more healthcare organizations and practices.

Our study offers several important innovations. Few prior EHR cohort studies focused on GAC have been performed. Given the relative incidence of this cancer, most prior studies have relied on a case-control study design.(810,21) While valuable for risk-factor discovery, case-control studies cannot provide measures of incidence. Nor do they allow calculation of absolute risk, a key criterion when implementing effective screening programs. Within the Kaiser Permanente system of Northern California, one study followed a cohort of ~4,300 individuals diagnosed with gastric intestinal metaplasia (a precancerous lesion) and identified 17 cases of GAC over an average of 6 years of follow-up.(22) These individuals all underwent endoscopy with biopsy. Our data fills an important gap in that it presents risk estimates applicable to a general population, and is not focused on any specific high-risk group. Our general population incidence estimate of 7.9 per 100,000 p-y is slightly higher than the reported national average for the US (estimated at ~5–6 per 100,000).(11) This may be due to the multiethnic nature of the catchment area, with higher proportions of Asians and Hispanics. One limitation is the lack of disaggregated Asian and Hispanic data. Interestingly, PAMF does collect disaggregated racial data; however, given the relative rarity of the outcome, the disaggregated estimates were unstable and also unreportable for privacy reasons.

While we found Asian race and Hispanic ethnicity to be associated with all GACs, the HRs were larger for non-cardia GACs. This finding is consistent with our knowledge of non-cardia GAC, which disproportionately impacts racial/ethnic minority and immigrant populations.(12) Moreover, cardia GAC are associated with gastroesophageal reflux and obesity, while 90% of non-cardia GAC are attributable to Hp infection.(23,24) In our study, both for all GACs and non-cardia GACs, a history of Hp failed to associate with cancer risk in multivariable analysis (though prior Hp infection did associate in univariable analysis). We hypothesize this is due to both incomplete phenotyping (at the EHR level) and clinical underdiagnosis (at the clinician level), which is discussed further below. Notably, in our sensitivity analyses (restricting C16.0, C16.8, and C16.9) the number of captured cancers was modest. While the magnitude of the HRs did not seem overly sensitive to these changes in outcome definitions, there was certainly a higher degree of uncertainty in the estimate. Our model predictive ability was higher for non-cardia GACs (c-statistic 0.9) compared to all GACs (c-statistic 0.8). This study was not designed a priori for formal model building—future work with imputation, cross-validation, and optimism correction would be needed to statistically assess the robustness of these models. Another notable limitation was that this study was designed with time-independent covariates; future studies built with time-varying covariates may have enhanced predictive power.

Our study captured data from a very large number of encounters (>30 million), inclusive of both outpatient and inpatient encounters, in-person and electronic visits, physician and non-physician encounters. Moreover, we integrated multiple common EHR data types to construct and comprehensively phenotype our cohort, including demographics, diagnoses, problem lists, family history, social history, and medications. We did not however include free-text data fields, nor did we employ NLP for phenotyping. Addition of NLP to our study may have been helpful in enhancing the characterization of certain phenotypic elements. Yet at the same time, use of NLP can introduce unexpected barriers to dissemination due to site-specific diversity in language, heterogenous reporting structures, and need for local validation.(25) As a related limitation, we were not able to capture socioeconomic data (e.g. income, education, occupation). While some of these data may be available in free-text format, their documentation is heterogenous, sparse, and unvalidated.

Structured data elements have important limitations, as our study also demonstrates. In many cases, incomplete and missing data become a significant concern. As example, in our cohort ‘missingness’ of smoking data was a strong predictor of GAC risk. Similarly, when analyzing race, preference to not answer the racial identity question was significantly associated with GAC risk. These data suggest some association between the missingness pattern and the outcome (i.e. not missing at random). In this study, we did not perform imputation as the purpose of the study was to present and characterize a cohort, including its missingness; future risk prediction modeling would require steps to address missing data. There is the added complexity of discerning the absence of a condition from missing data. To address this problem, we attempted contextual analysis. For instance, in the case of prior Hp infection, the documented rate with ICD coding was quite low (~3%). We supplemented this with pharmaceutical claims information and laboratory test data. Yet even with these additional phenotyping methods, the prevalence of prior Hp diagnosis in our cohort remained <4%. It is likely that a much larger share was infected with Hp, but were not tested clinically.

Importantly, even full capture of disease conditions through EHR data, be it through structured or unstructured methods, cannot address issues of underdiagnosis at the clinical level. For example, while the estimated prevalence of Hp in the US is ~35%,(26) only a fraction of individuals ever come to clinical attention as there are no guidelines for testing of asymptomatic individuals. Currently only symptomatic individuals or those with pre-existing clinical conditions (such as peptic ulcer disease) are recommended to be tested.(20) A similar problem exists for the presence of precursor lesions (atrophic gastritis), which in our cohort was diagnosed in <1% of the general population. Cross-sectional studies of individuals undergoing endoscopy with gastric biopsies suggest that the prevalence of precursor lesions may be as high as 5%.(27,28). It is therefore likely that more individuals in our cohort have gastric precursor conditions, but are undiagnosed. While use of NLP would not be able to solve the clinical underdiagnosis problem, NLP may add granularity to disease conditions. Using the example of gastric precursors, ICD coding appears a relatively blunt tool which cannot discriminate extent or severity of disease. Additional information found in pathology reports describing the extent, severity, and histologic attributes may help to stratify individuals for GAC risk.(29)

Based on the feasibility demonstrated in this study, we envision an EHR-based cancer control strategy. Among the general care-seeking population, we have demonstrated that common and reliably-captured EHR elements can stratify the population for incident risk. In certain developed nations of the Asian-Pacific region (South Korea and Japan), endoscopic screening of the asymptomatic, 40–75 year-old population has been shown to reduce cancer-specific mortality by ~40%.(30) While the overall incidence of GAC in the US is significantly lower, an EHR-based risk stratification system could identify high-incidence groups whose absolute risk matches or exceeds the Asian-Pacific region. Individuals above certain risk levels may be appropriate candidates to invite for endoscopic screening. This integrated approach may be one avenue to begin to reduce the high and currently unaddressed burden of GAC in the US.

Supplementary Material

1
2
3
4
5
6
7
8
9

Acknowledgements:

R.J. Huang is supported by the National Cancer Institute under Award Number K08CA252635. L.P. Palaniappan was supported through the National Heart Lung and Blood Institute under Award Number K24 HL150476. This work was supported by a grant from the Stanford Center for Asian Health Research and Education (CARE).

Abbreviations:

EHRs

Electronic health records

GAC

Gastric adenocarcinoma

Hp

Helicobacter pylori

H2 blocker

Histamine type 2 receptor antagonists/blocker

HR

Hazard ratio

ICD-9

International Classification of Diseases, Ninth Revision

ICD-10

International Classification of Diseases, Tenth Revision

LOINC

Logical Observation Identifier Names and Codes

NLP

Natural language processing

NSAID

Non-steroidal anti-inflammatory drug

PAMF

Palo Alto Medical Foundation

p-y

Person-years

PPI

Proton-pump inhibitor

US

United States

Footnotes

Conflicts of Interest: The authors report no conflicts of interest, financial or otherwise, with regards to the submitted work.

References

  • 1.Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71(3):209–49 doi 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
  • 2.Choi IJ, Lee JH, Kim YI, Kim CG, Cho SJ, Lee JY, et al. Long-term outcome comparison of endoscopic resection and surgery in early gastric cancer meeting the absolute indication for endoscopic resection. Gastrointest Endosc 2015;81(2):333–41 e1 doi 10.1016/j.gie.2014.07.047. [DOI] [PubMed] [Google Scholar]
  • 3.Wang S, Zhang Z, Liu M, Li S, Jiang C. Endoscopic Resection Compared with Gastrectomy to Treat Early Gastric Cancer: A Systematic Review and Meta-Analysis. PLoS One 2015;10(12):e0144774 doi 10.1371/journal.pone.0144774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pyo JH, Lee H, Min BH, Lee JH, Choi MG, Lee JH, et al. Long-Term Outcome of Endoscopic Resection vs. Surgery for Early Gastric Cancer: A Non-inferiority-Matched Cohort Study. Am J Gastroenterol 2016;111(2):240–9 doi 10.1038/ajg.2015.427. [DOI] [PubMed] [Google Scholar]
  • 5.Huang RJ, Epplein M, Hamashima C, Choi IJ, Lee E, Deapen D, et al. An Approach to the Primary and Secondary Prevention of Gastric Cancer in the United States. Clin Gastroenterol Hepatol 2022;20(10):2218–2228 doi 10.1016/j.cgh.2021.09.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huang RJ, Laszkowska M, In H, Hwang JH, Epplein M. Controlling Gastric Cancer in a World of Heterogeneous Risk. Gastroenterology 2023;164(5):736–51 doi 10.1053/j.gastro.2023.01.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Williams BA. Constructing Epidemiologic Cohorts from Electronic Health Record Data. Int J Environ Res Public Health 2021;18(24) doi 10.3390/ijerph182413193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Taninaga J, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, et al. Prediction of future gastric cancer risk using a machine learning algorithm and comprehensive medical check-up data: A case-control study. Sci Rep 2019;9(1):12384 doi 10.1038/s41598-019-48769-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Huang RJ, Kwon NS, Tomizawa Y, Choi AY, Hernandez-Boussard T, Hwang JH. A Comparison of Logistic Regression Against Machine Learning Algorithms for Gastric Cancer Risk Prediction Within Real-World Clinical Data Streams. JCO Clin Cancer Inform 2022;6:e2200039 doi 10.1200/CCI.22.00039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Briggs E, de Kamps M, Hamilton W, Johnson O, McInerney CD, Neal RD. Machine Learning for Risk Prediction of Oesophago-Gastric Cancer in Primary Care: Comparison with Existing Risk-Assessment Tools. Cancers (Basel) 2022;14(20) doi 10.3390/cancers14205023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ferlay J, Ervik M, Lam F, Colombet M, Mery L, Pineros M, et al. Global Cancer Observatory: Cancer Today. International Agency for Research on Cancer. Available from: https://gco.iarc.fr/today, accessed [3 August 2022]. Lyon, France: 2020. [Google Scholar]
  • 12.Shah SC, McKinley M, Gupta S, Peek RM Jr., Martinez ME, Gomez SL. Population-Based Analysis of Differences in Gastric Cancer Incidence Among Races and Ethnicities in Individuals Age 50 Years and Older. Gastroenterology 2020;159(5):1705–14 e2 doi 10.1053/j.gastro.2020.07.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Anderson WF, Camargo MC, Fraumeni JF Jr., Correa P, Rosenberg PS, Rabkin CS. Age-specific trends in incidence of noncardia gastric cancer in US adults. JAMA 2010;303(17):1723–8 doi 10.1001/jama.2010.496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Camargo MC, Anderson WF, King JB, Correa P, Thomas CC, Rosenberg PS, et al. Divergent trends for gastric cancer incidence by anatomical subsite in US adults. Gut 2011;60(12):1644–9 doi 10.1136/gut.2010.236737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Thrift AP, Wenker TN, El-Serag HB. Global burden of gastric cancer: epidemiological trends, risk factors, screening and prevention. Nat Rev Clin Oncol 2023;20(5):338–49 doi 10.1038/s41571-023-00747-0. [DOI] [PubMed] [Google Scholar]
  • 16.U.S. Census Bureau; American Community Survey 1-year estimates. [cited 2023 June 1]. Available from: http://censusreporter.org/profiles/33000US488-san-jose-san-francisco-oakland-ca-csa/
  • 17.Tang Z, Kadiyska Y, Li H, Suciu D, Brinkley JF. Dynamic XML-based exchange of relational data: application to the Human Brain Project. AMIA Annu Symp Proc 2003;2003:649–53. [PMC free article] [PubMed] [Google Scholar]
  • 18.Correa P. Human gastric carcinogenesis: a multistep and multifactorial process--First American Cancer Society Award Lecture on Cancer Epidemiology and Prevention. Cancer Res 1992;52(24):6735–40. [PubMed] [Google Scholar]
  • 19.Olefson S, Moss SF. Obesity and related risk factors in gastric cardia adenocarcinoma. Gastric Cancer 2015;18(1):23–32 doi 10.1007/s10120-014-0425-4. [DOI] [PubMed] [Google Scholar]
  • 20.Chey WD, Leontiadis GI, Howden CW, Moss SF. ACG Clinical Guideline: Treatment of Helicobacter pylori Infection. Am J Gastroenterol 2017;112(2):212–39 doi 10.1038/ajg.2016.563. [DOI] [PubMed] [Google Scholar]
  • 21.Choi AY, Strate LL, Fix MC, Schmidt RA, Ende AR, Yeh MM, et al. Association of gastric intestinal metaplasia and East Asian ethnicity with the risk of gastric adenocarcinoma in a U.S. population. Gastrointest Endosc 2018;87(4):1023–8 doi 10.1016/j.gie.2017.11.010. [DOI] [PubMed] [Google Scholar]
  • 22.Li D, Bautista MC, Jiang SF, Daryani P, Brackett M, Armstrong MA, et al. Risks and Predictors of Gastric Adenocarcinoma in Patients with Gastric Intestinal Metaplasia and Dysplasia: A Population-Based Study. Am J Gastroenterol 2016;111(8):1104–13 doi 10.1038/ajg.2016.188. [DOI] [PubMed] [Google Scholar]
  • 23.Mukaisho K, Nakayama T, Hagiwara T, Hattori T, Sugihara H. Two distinct etiologies of gastric cardia adenocarcinoma: interactions among pH, Helicobacter pylori, and bile acids. Front Microbiol 2015;6:412 doi 10.3389/fmicb.2015.00412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Plummer M, de Martel C, Vignat J, Ferlay J, Bray F, Franceschi S. Global burden of cancers attributable to infections in 2012: a synthetic analysis. Lancet Glob Health 2016;4(9):e609–16 doi 10.1016/S2214-109X(16)30143-7. [DOI] [PubMed] [Google Scholar]
  • 25.Carrell DS, Schoen RE, Leffler DA, Morris M, Rose S, Baer A, et al. Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings. J Am Med Inform Assoc 2017;24(5):986–91 doi 10.1093/jamia/ocx039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hooi JKY, Lai WY, Ng WK, Suen MMY, Underwood FE, Tanyingoh D, et al. Global Prevalence of Helicobacter pylori Infection: Systematic Review and Meta-Analysis. Gastroenterology 2017;153(2):420–9 doi 10.1053/j.gastro.2017.04.022. [DOI] [PubMed] [Google Scholar]
  • 27.Choi CE, Sonnenberg A, Turner K, Genta RM. High Prevalence of Gastric Preneoplastic Lesions in East Asians and Hispanics in the USA. Dig Dis Sci 2015;60(7):2070–6 doi 10.1007/s10620-015-3591-2. [DOI] [PubMed] [Google Scholar]
  • 28.Sonnenberg A, Lash RH, Genta RM. A national study of Helicobactor pylori infection in gastric biopsy specimens. Gastroenterology 2010;139(6):1894–901 e2; quiz e12 doi 10.1053/j.gastro.2010.08.018. [DOI] [PubMed] [Google Scholar]
  • 29.Gupta S, Li D, El Serag HB, Davitkov P, Altayar O, Sultan S, et al. AGA Clinical Practice Guidelines on Management of Gastric Intestinal Metaplasia. Gastroenterology 2020;158(3):693–702 doi 10.1053/j.gastro.2019.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Zhang X, Li M, Chen S, Hu J, Guo Q, Liu R, et al. Endoscopic Screening in Asian Countries Is Associated With Reduced Gastric Cancer Mortality: A Meta-analysis and Systematic Review. Gastroenterology 2018;155(2):347–54 e9 doi 10.1053/j.gastro.2018.04.026. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5
6
7
8
9

Data Availability Statement

This cohort was created using real-world clinical data from living individuals. Data cannot be shared publicly due to patient confidentiality. The de-identified data underlying the results presented in the study can be requested by contacting the corresponding author for researchers who meet the criteria for access to confidential data.

RESOURCES