Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Apr 8.
Published in final edited form as: Int J Cancer. 2026 Jan 16;158(11):2890–2899. doi: 10.1002/ijc.70332

Etiology of gene expression-based subtypes of breast cancer in the Ghana Breast Health Study

Amber N Hurson 1, Ebonee N Butler 2, Alina M Hamilton 2,3, Khushali K Shah 1, Bryan Aapentuo Sienso 4, Grace Adjoa Ocansey 5, Sheba Mary Pognaa Kunfah 6, Bernard Petershie 7, Kaitlin E White 1, Lawrence Edusei 8, Ernest Adjei 9, Florence Dedey 10,11, Verna Vanderpuye 12, Joe-Nat Clegg-Lamptey 10,11, Joel Yarney 12, Richard Biritwum 13, Kofi M Nyarko 14, Francis Aitpillah 7,9, Joseph K Oppong 9, Ernest Osei-Bonsu 15, Daniel Ansong 9, Baffour Awuah 9, Beatrice Addai Wiafe 16, Seth Wiafe 17, Louise Brinton 1, Thomas U Ahearn 1, Melissa A Troester 2,3, Jonine D Figueroa 1, Nicolas Titiloye 7, Montserrat Garcia-Closas 18,19, Mustapha Abubakar 1
PMCID: PMC13054941  NIHMSID: NIHMS2160870  PMID: 41544200

Abstract

Breast cancers are heterogenous and largely classified using immunohistochemistry of estrogen receptor expression. However, research suggests RNA-based subtyping, including intrinsic (luminal vs. non-luminal) and TP53-based subtypes, may offer additional etiologic insight. TP53 mutant tumors, often more aggressive and non-luminal, are common among women of African descent. We examined possible heterogeneity for RNA-based luminal/non-luminal and TP53 subtypes among women of west African ancestry. We analyzed 595 invasive breast cancer cases and 2,096 controls in the Ghana Breast Health Study. RNA was extracted from formalin-fixed paraffin-embedded tumor samples and profiled via nCounter® Breast Cancer 360. Tumors were classified as luminal (N=278) vs. non-luminal (N=282) and TP53 wildtype-like (N=324) vs. mutant-like (N=271) using the PAM50 assay and a validated RNA signature, respectively. Case-control odds ratios and 95% confidence intervals were estimated using polytomous logistic regression. Etiologic heterogeneity was assessed in case-only analyses. Higher parity was more protective for luminal than non-luminal tumors (p-heterogeneity=0.05). Older age at menarche and alcohol use ≥6 months were associated with elevated risk of luminal, but not non-luminal tumors (p-heterogeneity=0.01). Similar trends were observed for TP53 wildtype-like tumors, though not statistically significant. Cross-classification of PAM50/TP53 showed that higher parity, older age at menarche, and alcohol use ≥6 months were more strongly associated with luminal/TP53 wildtype-like than other subtypes. RNA-based breast cancer subtyping suggests TP53 refines breast cancer etiologic heterogeneity in a sub-Saharan African population. The high prevalence of aggressive, mostly TP53-mutant tumors in this population underscores the need for further studies to clarify etiologic heterogeneity.

Keywords: breast cancer, etiologic heterogeneity, Nanostring, gene expression, risk factors

Novelty and Impact:

Black women in sub-Saharan Africa have a high prevalence of aggressive tumor phenotypes, the etiologies of which cannot be explained by estrogen receptor status alone. In this population-based case-control study involving Ghanaian women, we found an RNA-based (PAM50/TP53) subtyping schema to refine etiologic relationships for parity, menarcheal age, alcohol, and oral contraceptive use. These findings underscore the value of RNA-based subtyping in understanding breast cancer etiology in West African populations with high prevalences of aggressive tumors.

INTRODUCTION

It is well established that breast cancer is a heterogeneous disease, with clinically, molecularly and pathologically defined subtypes that have very different etiologies and outcomes.1-3 A recent systematic review found suggestive to convincing evidence of breast cancer risk factor heterogeneity across estrogen receptor (ER) subtypes, with consistent patterns across multiple racial and ethnic populations.4

Breast cancer incidence and mortality rates (including its molecular subtypes) vary considerably by race and ethnicity, despite consistent subtype-specific risk factor effects across populations.4,5 Data on cancer incidence in sub-Saharan Africa are extremely limited, but in the US, Black women have higher incidence of aggressive breast cancer and higher mortality rates of all subtypes compared to White women. 6,7 Nonbiological factors (e.g., socioeconomic status, access to health care,) are clear contributors to racial/ethnic/geographic heterogeneity in breast cancer incidence and mortality rates, however, differences persist after controlling for these factors, suggesting tumor biology and genetics might also be playing a role. The hypothesis that West African ancestry influences breast cancer biology is supported by research showing higher rates of an aggressive breast cancer phenotype in West Africa compared to other African regions,8,9 as well as the observation of an elevated tumor mutational burden and distinct immunologic profiles among breast cancer patients of West-African descent.10,11

The majority of breast cancer subtyping for clinical and etiologic research purposes has been based on immunohistochemical markers of ER status. However, emerging data indicate that RNA-based subtyping, encompassing intrinsic (luminal vs. non-luminal) and TP53 subtypes, may provide additional etiologic resolution beyond ER status. For example, within a racially diverse population-based study in the US, we previously assessed the relative contribution of different tumor markers to the heterogeneity effects of established breast cancer risk factors.12 RNA-based TP53 and immunohistochemistry (IHC)-based ER accounted for more heterogeneity than other markers, had specific risk factor profiles, and were found to have independent and combined effects. However these markers have never been applied in etiologic studies of African populations.

Breast cancer incidence rates are increasing in sub-Saharan Africa,13 with high incidence of tumors with aggressive characteristics,9 mirroring patterns among African American women in the US. The Ghana Breast Health Study (GBHS) is a population-based case-control study that was designed to investigate breast cancer etiology among women with west African ancestry to further our understanding of aggressive breast tumor subtypes. A prior analysis in this population evaluated associations between reproductive risk factors and ER-based breast cancer subtypes14 and found no statistically significant differences by ER status. However, after stratifying by ≥50 and <50 years of age, associations between some reproductive factors (parity and breastfeeding) differed by ER subtypes with patterns expected based on previous literature. Herein we extend that work by employing gene expression-based classifiers to categorize the breast tumors into intrinsic (luminal vs. non-luminal) and TP53 (mutant-like vs. wildtype-like) subtypes to further resolve etiologic factors not fully captured by IHC or in ER-based classification schema.

MATERIALS AND METHODS

Study Population

The Ghana Breast Health Study (GBHS) is a population-based case-control study run from 2013 to 2015. The study was conducted in collaboration with the three Ghanaian hospitals responsible for treating most of the country’s breast cancer cases: Korle Bu Teaching Hospital (KBTH) in Accra, as well as Komfo Anokye Teaching Hospital (KATH) and Peace and Love Hospital (PLH) in Kumasi.15 Cases were defined as women who, within the preceding year, presented with a breast lump suspected to be malignant and were subsequently referred either for biopsy at one of the study hospitals or for clinical management. Controls were women aged 18 to 74 years and never diagnosed with breast cancer who were identified through household enumeration of census-defined geographic areas within Ghana’s Ashanti, Central, Eastern, and Greater Accra regions. All participants, both cases and controls, were required to meet the following criteria: (1) female sex, (2) age 18–74 years, (3) residence for at least one year in the defined catchment areas surrounding Kumasi and Accra, and (4) completion of an in-person interview.

Recruitment started in 2013 and ended in October of 2015 with enrollment (i.e., interviewing) of 2,202 cases and 2,106 controls which were age and region of residence matched as previously described.15 Participation rates were over 90% for cases and controls.15 Suspected breast cancer cases were enrolled at the time of biopsy, with ultimately N=1,071 of the 2,202 cases receiving pathologically confirmed breast cancer diagnoses.

Risk Factor Information

Data on breast cancer risk factors from the GBHS were collected through a standardized interview-based questionnaire. The questionnaire includes information on an expansive range of patient characteristics and risk factors.15 In the present analysis we evaluated well-established breast cancer risk factors, which have been shown to be related to risk in studies of other (predominantly European ancestry) populations: age at menarche, parity, age at first birth, breastfeeding duration, oral contraceptive (OC) use, body size, family history of breast cancer, and alcohol use.4 Breastfeeding duration is defined as the median months of breastfeeding per pregnancy. Body size is classified according to a previously published 9-scale pictogram.16 Alcohol use was assessed by asking: “Has there ever been a time in your life when you had at least one drink a month?” and “For how long did you have at least one drink a month: Would you say less than six months, or six months or longer?”.

Tumor biopsy tissue collection

A total of 4-8 needle core biopsies (14-gauge) were taken from cases prior to any treatment and processed into FFPE blocks for diagnostic purposes using standardized protocols.15 Blocks that were not needed for diagnosis were sent to the NCI for research (75% of cases had one block and 25% had 2-5 blocks). Centralized histopathology review of H&E sections was conducted for tumor blocks sent to the U.S., where 1,071 biopsy samples were confirmed to be malignant tumor.

For this study, inclusion criteria were 1) pathologically confirmed invasive cases and, 2) availability of biopsy tissue blocks. To ensure high-quality mRNA expression data, biopsy tissue blocks containing less than 10% tumor tissue (as determined by pathology review of H&E sections) were excluded (N=327). This left 745 tissue blocks eligible for analysis (Supplementary Figure 1).

Clinicopathological data

Methods for obtaining information on IHC markers have been described previously.14 Briefly, IHC marker status was obtained from pathology departments in Ghana for 69% of cases. Tumors were classified as positive for estrogen receptor (ER) and progesterone receptor (PR) if ≥10% of cells showed positive staining. HER2 status was considered positive if staining was 3+. Borderline and negative cases were considered HER2 negative. Agreement of IHC assays performed in pathology departments in Ghana with those performed at the NCI laboratory was high (79% for ER, 65%, for PR, and 78% for HER2; p<0.01).14 Tumor size was determined by clinical palpation at diagnosis, and histologic grade was assessed through centralized pathology review.

RNA-based gene expression

Gene expression was analyzed using the Breast Cancer 360 panel on the Nanostring nCounter® platform, which measures expression of 776 genes that are involved in various key breast cancer pathways and processes.17 Gene expression data was run in two batches and then cleaned and normalized using Remove Unwanted Variation (RUVg)18 as previously described19,20. For normalization purposes, we leveraged 11 out of 17 available housekeeping genes with r Pearson correlation values ≥ 0.85 (i.e., SF3A1, MRPL19, POLR2A, ABCF1, PUM1, TBC1D10B, SDHA, OAZ1, UBB, PSMC4 and TBP), and using RUV k=1. Of 745 tissue blocks eligible for RNA extraction, 595 passed normalization (80% passing rate, Supplementary Figure 1).

A research version of the PAM50 molecular subtype predictor21 was used to classify tumors as luminal A, luminal B, HER2-enriched, basal-like or normal-like. Tumor subtype was then dichotomized as luminal (luminal A or luminal B) or non-luminal (HER2-enriched or basal-like). Few tumors were classified as normal-like (n=35 [6%]), typically reflecting low tumor content; therefore, only estimates for luminal and non-luminal subtypes are shown. We also applied a previously validated RNA signature that aggregates information on 48 TP53-dependent genes to classify TP53 status (mutant-like or wildtype-like) based on a similarity-to-centroid approach, as previously described.22

Statistical Methods

To determine whether the characteristics of individuals included in the analysis differed from those that were excluded (based on percentage of tumor tissue for RNA analysis), we compared frequencies of patient/clinical characteristics and risk factors between cases with 0, >0 to <10, and ≥10% tumor tissue using a chi-square test. We also assessed the degree of agreement between RNA- and IHC-based subtyping schema by calculating the percent agreement between PAM50 intrinsic subtype and ER status, as well as between PAM50 intrinsic subtype and hormone receptor status.

In case-control analyses, polytomous unconditional logistic regression models were used to estimate odds ratios (OR) and 95% confidence intervals (CI) between breast cancer risk factors and tumor subtypes defined by RNA-based TP53 status (wildtype-like/mutant-like) and by PAM50 intrinsic subtype (luminal/non-luminal). To evaluate the combined effects of these two markers, we estimated associations between risk factors and the four possible breast cancer subtypes when cross-classifying RNA-based TP53 status and PAM50 intrinsic subtype (TP53 wildtype-like/luminal, TP53 wildtype-like/non-luminal, TP53 mutant-like/luminal, TP53 mutant-like/non-luminal). In additional case-control analyses, we estimated the effects of risk factors, i.e., OC and alcohol use, that were previously unpublished in the GBHS on risk of ER subtypes.14

Case-case p-values were estimated to test for etiologic heterogeneity of risk factor associations for TP53 mutant-like compared to wildtype-like cases, as well as for non-luminal compared to luminal cases. Given the high correlation between several breast tumor markers, we conducted a sensitivity analysis to estimate the independent effect of several additional tumor characteristics (tumor grade, ER status, PR status, HER2 status, and tumor size), with RNA-based TP53 status and PAM50 intrinsic subtype, on the risk factors of interest.

For all analysis, risk factors were modeled as dichotomous variables and models were adjusted for study site, age (as a continuous variable), education, and mutually adjusted for all risk factors. All statistical tests were two-sided. Analyses were conducted in R software version 4.2.0 (R Foundation for Statistical Computing).

RESULTS

Table 1 describes the frequencies of characteristics for controls and breast cancer cases. Compared to controls, cases had later ages at first birth, fewer months breastfeeding, more OC use, and a higher frequency of breast cancer family history. Among cases, aggressive tumor characteristics (e.g., hormone receptor negative status, grade 3), as well as younger ages at first birth were more frequent among TP53 mutant-like compared to wildtype-like cases, as well as among non-luminal compared to luminal cases (Supplementary Table 1). Characteristics of cases with tumor samples eligible for RNA extraction (i.e., ≥10% tumor tissue) were similar to those with ineligible samples (<10% tumor tissue), except with regard to tumor size and ever use of alcohol (Supplementary Table 2). Consistent with previous studies,23-25 high agreement was observed between RNA- and IHC-based subtyping schema (79% agreement, Supplementary Table 3).

Table 1.

Risk factor frequencies (%) among controls and breast cancer cases from the Ghana Breast Health Study.

Controls (N=2,096) Cases (N=595)
Mean age, years (SD) 45.6 (12.9) 50.0 (12.0)
Education
 No formal education 489 24.2 142 25.3
 Primary school 369 18.2 83 14.8
 Junior secondary school 654 32.3 140 25.0
 Senior secondary school and above 512 25.3 196 34.9
 Missing/Other 72 34
Site
 KATH 774 36.9 177 29.7
 KBTH 728 34.7 154 25.9
 PLH 594 28.3 264 44.4
Age at menarche, years
 <15 565 29.9 131 25.6
 15 548 29.0 144 28.2
 16 381 20.2 111 21.7
 ≥17 395 20.9 125 24.5
 Missing 207 84
Parity
 Nulliparous 228 10.9 51 8.6
 1-2 births 528 25.3 160 27.0
 3-4 births 683 32.7 198 33.4
 ≥5 births 649 31.1 184 31.0
 Missing 8 2
Age at first birth, years
 <19 552 31.0 118 23.0
 19-21 509 28.6 150 29.2
 22-25 411 23.1 136 26.5
 ≥26 306 17.2 109 21.2
 Missing 318 82
Median months breastfeeding per pregnancy, among parous women
 <13 352 19.8 131 25.9
 13-18 688 38.6 180 35.6
 ≥19 742 41.6 195 38.5
 Missing 314 89
Oral contraceptive use
 Never 1,463 87.0 408 83.4
 Ever 218 13.0 81 16.6
 Missing 415 106
Body size
 Slight 583 28.6 145 25.8
 Average 824 40.5 215 38.2
 Slightly heavy/heavy 629 30.9 203 36.1
 Missing 60 32
Family history of breast cancer
 No 2,026 97.8 545 93.2
 Yes 46 2.2 40 6.8
 Missing 24 10
Alcohol use
 Never 1,427 68.1 386 65.0
 Ever 668 31.9 208 35.0
 Missing 1 1
Total duration of alcohol use, among ever users
 <6 months 486 72.9 137 65.9
 ≥6 months 181 27.1 71 34.1
 Missing 2 1

Ever use of alcohol is defined as consuming at least one drink per month.

Differences in risk factor patterns were observed across breast cancer subtypes defined by RNA-based TP53 status (Table 2). Alcohol use duration ≥6 months was associated with TP53 wildtype-like (1.51 [1.03, 2.20]), but not with mutant-like (p-heterogeneity=0.06). Although not statistically different across subtypes in case-case analyses, we observed an association between later age at menarche (OR [95% CI]=1.37 [1.06, 1.78]) and higher parity (0.67 [0.50, 0.90]) with TP53 wildtype-like, but not with mutant-like breast cancer. Additionally, there was a suggestive association between ever use of alcohol and TP53 wildtype-like status (1.28 [0.99, 1.66]), and no association with mutant-like.

Table 2.

Case-control odds ratios and 95% confidence intervals for risk factors and breast cancer subtypes defined by RNA-based TP53 functional status, with case-case p-values evaluating heterogeneity across TP53 subtypes.

TP53 Wildtype-like
(N=324) vs.
Controls
TP53 Mutant-like
(N=271) vs.
Controls
TP53 Mutant-like
vs. Wildtype-like (p-
het)
Age at menarche
  ≥16 vs. <16 years 1.37 (1.06, 1.78)* 1.00 (0.75, 1.33) 0.11
Parity
  ≥3 vs. <3 births 0.67 (0.50, 0.90)** 0.82 (0.57, 1.13) 0.38
Age at first birth
  ≥22 vs. <22 years 1.31 (0.99, 1.72) 1.41 (1.05, 1.89)* 0.63
Median months breastfeeding per pregnancy, among parous women
  ≥19 vs. <19 months 0.89 (0.68, 1.17) 1.07 (0.80, 1.43) 0.27
Oral contraceptive use
  Ever vs. Never 1.41 (0.97, 2.06) 1.50 (1.01, 2.23)* 0.99
Body size
  Slightly heavy/heavy vs. Slight/average 1.29 (0.99, 1.68) 1.37 (1.04, 1.82)* 0.74
Family history of breast cancer
  Yes vs. No 2.67 (1.54, 4.62)*** 2.74 (1.55, 4.84)*** 0.80
Alcohol use
  Ever vs. Never 1.28 (0.99, 1.66) 0.94 (0.71, 1.25) 0.13
Total duration of alcohol use
  <6 months vs. Never 1.18 (0.88, 1.59) 0.97 (0.70, 1.34) 0.46
  ≥6 months vs. Never 1.51 (1.03, 2.20)* 0.89 (0.56, 1.41) 0.06

Ever use of alcohol is defined as consuming at least one drink per month.

Model is adjusted for age, education, study site, and all risk factors listed above.

Case-case p-values are estimated using a likelihood ratio test.

p<0.1

*

p<0.05

**

p<0.01

***

p<0.001

Risk factor patterns also differed across luminal/non-luminal subtypes, as shown in Table 3. Age at menarche ≥16 years, (1.27 [0.96, 1.67], p-het=0.01), ≥3 births (0.63 [0.46, 0.86], p-het=0.05), and alcohol use duration ≥6 months (1.75 [1.19, 2.58], p-het=0.01) were associated with luminal, but not non-luminal breast cancer. In further analysis of associations with all PAM50 subtypes (Supplementary Table 4), higher parity and alcohol use duration ≥6 months were more strongly associated with luminal B than with luminal A subtype. Although case-case comparisons did not reach statistical significance, ever use of OCs was associated with non-luminal (1.68 [1.15, 2.46]), not luminal subtype; and ever use of alcohol was associated with luminal (1.30 [0.99, 1.71]), not non-luminal subtype. Further analysis revealed that the association between OC use and non-luminal subtypes was limited to basal-like cases, while ever use of alcohol was primarily associated with luminal B tumors, as shown in Supplementary Table 4.

Table 3.

Case-control odds ratios and 95% confidence intervals for risk factors and breast cancer subtypes defined by PAM50 intrinsic subtype (luminal/non-luminal), with case-case p-values evaluating heterogeneity across intrinsic subtypes.

Luminal (N=278)
vs. Controls
Non-Luminal
(N=282) vs.
Controls
Non-Luminal
vs. Luminal (p-
het)
Age at menarche
  ≥16 vs. <16 years 1.27 (0.96, 1.67) 0.98 (0.74, 1.30) 0.01
Parity
  ≥3 vs. <3 births 0.63 (0.46, 0.86)** 0.78 (0.56, 1.08) 0.05
Age at first birth
  ≥22 vs. <22 years 1.36 (1.02, 1.83)* 1.36 (1.02, 1.82)* 0.75
Median months breastfeeding per pregnancy, among parous women
  ≥19 vs. <19 months 0.93 (0.70, 1.25) 0.96 (0.72, 1.27) 0.18
Oral contraceptive use
  Ever vs. Never 1.17 (0.77, 1.78) 1.68 (1.15, 2.46)** 0.30
Body size
  Slightly heavy/heavy vs. Slight/average 1.30 (0.98, 1.73) 1.40 (1.06, 1.85)* 0.51
Family history of breast cancer
  Yes vs. No 2.82 (1.60, 4.98)*** 2.45 (1.38, 4.37)** 0.71
Alcohol use
  Ever vs. Never 1.30 (0.99, 1.71) 0.99 (0.74, 1.30) 0.41
Total duration of alcohol use
  <6 months vs. Never 1.10 (0.79, 1.52) 1.06 (0.77, 1.45) 0.93
  ≥6 months vs. Never 1.75 (1.19, 2.58)** 0.84 (0.53, 1.34) 0.01

Ever use of alcohol is defined as consuming at least one drink per month.

Luminal subtype includes Luminal A and Luminal B. Non-Luminal subtype includes HER2-enriched and Basal-like.

Model is adjusted for age, education, study site, and all risk factors listed above.

Case-case p-values are estimated using a likelihood ratio test.

p<0.1

*

p<0.05

**

p<0.01

***

p<0.001

Table 4 reports the associations of risk factors with breast cancer subtypes defined by the joint classification of RNA-based TP53 status and luminal/non-luminal subtypes. Inclusion of both markers helped to clarify the association with age at menarche and parity, such that later age at menarche and higher parity were associated with the luminal/TP53 wildtype-like subtype (1.31 [0.98, 1.75] and 0.61 [0.44, 0.85], respectively) but not the other subtypes. For certain risk factors, the subtype heterogeneity was adequately captured by only one of the two markers. For instance, ever use of OCs was associated with an increased risk of non-luminal subtype, regardless of TP53 status (although the association with TP53 wildtype-like did not reach statistical significance). Similarly, ≥6 months duration of alcohol use was associated with an increased risk of luminal subtype, regardless of TP53 status (although the association with TP53 mutant-like did not reach statistical significance).

Table 4.

Case-control odds ratios and 95% confidence intervals for risk factors and luminal/non-luminal breast cancer subtypes, stratified by RNA-based TP53 functional status.

Luminal Non-Luminal
TP53 Wildtype-like
(N=254)
TP53 Mutant-like
(N=24)
TP53 Wildtype-like
(N=39)
TP53 Mutant-like
(N=243)
Age at menarche
  ≥16 vs. <16 years 1.31 (0.98, 1.75) 0.89 (0.36, 2.21) 0.84 (0.43, 1.64) 1.01 (0.74, 1.36)
Parity
  ≥3 vs. <3 births 0.61 (0.44, 0.85)** 0.88 (0.33, 2.38) 0.66 (0.30, 1.47) 0.80 (0.57, 1.13)
Age at first birth
  ≥22 vs. <22 years 1.34 (0.99, 1.82) 1.58 (0.64, 3.90) 1.34 (0.66, 2.71) 1.36 (1.00, 1.85)
Median months breastfeeding per pregnancy, among parous women
  ≥19 vs. <19 months 0.88 (0.65, 1.20) 1.50 (0.64, 3.54) 0.68 (0.33, 1.41) 1.01 (0.75, 1.37)
Oral contraceptive use
  Ever vs. Never 1.23 (0.80, 1.90) 0.68 (0.16, 3.00) 1.85 (0.77, 4.47) 1.65 (1.10, 2.48)**
Body size
  Slightly heavy/heavy vs. Slight/average 1.28 (0.95, 1.72) 1.56 (0.65, 3.72) 1.67 (0.86, 3.27) 1.36 (1.01, 1.83)
Family history of breast cancer
  Yes vs. No 2.95 (1.65, 5.28)*** 1.57 (0.20, 12.15) ---- 2.90 (1.62, 5.19)***
Alcohol use
  Ever vs. Never 1.34 (1.01, 1.79)* 0.91 (0.37, 2.27) 1.33 (0.68, 2.61) 0.93 (0.69, 1.26)
Total duration of alcohol use
  <6 months vs. Never 1.16 (0.83, 1.62) 0.57 (0.16, 1.99) 1.41 (0.67, 2.97) 1.00 (0.72, 1.40)
  ≥6 months vs. Never 1.76 (1.17, 2.63)** 1.70 (0.54, 5.28) 1.17 (0.42, 3.27) 0.78 (0.47, 1.30)

Ever use of alcohol is defined as consuming at least one drink per month.

Luminal subtype includes Luminal A and Luminal B. Non-Luminal subtype includes HER2-enriched and Basal-like.

Model is adjusted for age, education, study site, and all risk factors listed above.

p<0.1

*

p<0.05

**

p<0.01

***

p<0.001

To understand the contributions of multiple correlated tumor characteristics, including TP53 status and PAM50 subtype, in determining risk factor associations, we modeled these markers simultaneously with other highly correlated breast tumor characteristics in case-only logistic regression models with risk factors as outcomes (Supplementary Table 5). The markers with the strongest independent contribution to the subtype heterogeneity for age at menarche were PR (p=0.05), HER2 (p=0.03), and to a lesser extent ER (p=0.08). For parity, it was HER2 (p=0.05). The markers contributing to heterogeneity for OC use were PAM50 (p=0.02) and to a lesser extent TP53 (p=0.10). For ever use of alcohol it was grade (p=0.01) and for <6 months duration of alcohol use it was grade (p=0.01) and PR (p=0.03).

Analyzing all pathologically confirmed invasive cases by ER status (N=926, Supplementary Table 6) revealed similar subtype-specific patterns, with a stronger association between ever OC use and risk of ER-negative than with ER-positive subtype (1.56 [1.18, 2.05] and 1.24 [0.92, 1.68], respectively), and an association between alcohol use duration ≥6 months and ER-positive (1.52 [1.08, 2.12]), but not ER-negative breast cancer.

DISCUSSION

We investigated the contribution of two distinct biological processes on etiologic heterogeneity of breast cancer in a population-based study of Ghanaian women. The estrogen-dependent pathway, represented in this study by PAM50 subtype (luminal/non-luminal), is a commonly used and useful mechanism for characterizing breast cancer subtype heterogeneity across populations, particularly for reproductive, anthropometric, and medical history factors.4 There is also growing evidence for the importance of the DNA repair pathway, represented in this study by TP53 status, in breast cancer etiology.12,26-30 The present analysis demonstrates that in the GBHS, these tumor markers independently and jointly define breast cancer subtypes with unique risk factor associations.

TP53 and luminal subtypes were found to be jointly driving the subtype heterogeneity for parity and age at menarche in our study. In line with our findings, the dual action of these pathways has been observed for the association with parity12,27,28,30 and age at menarche27,28 within several racially and ethnically diverse population-based studies in the US. Among Chinese women, the combination of TP53 and luminal/non-luminal subtypes refined parity-related breast cancer etiologic heterogeneity beyond any of the markers individually.26 Additionally, prior studies have found markers of TP53 and ER status to be jointly informative when characterizing subtype heterogeneity for pre- and post-menopausal BMI,27,28 breastfeeding,27,30 and menopausal status.27

In the current study, OC and alcohol use demonstrated suggestive evidence for etiologic heterogeneity by luminal and TP53 status. OC use appears to be primarily acting through the estrogen-dependent pathway and alcohol use through the DNA-repair pathway to impact breast cancer risk. However, the analysis of the joint classification of RNA-based TP53 and luminal/non-luminal subtypes (Table 4), along with case-case analyses across multiple correlated tumor characteristics (Supplementary Table 5) suggest that the relationships may be more nuanced, and both pathways could be involved. Prior studies of US populations have had mixed findings, with breast cancer subtype associations for OC use reportedly driven by TP53 status,12 ER status,12 or jointly by TP53 and ER.28 In the only prior study of TP53 and ER tumor markers with alcohol use (which included a US study population), this risk factor was not observed to be acting through either of the two pathways.12 When interpreting the results of this study, it is important to note that the reported prevalence of alcohol use in the GBHS was low compared to high income countries. Only 8% of controls reported drinking ≥1 drink per month for over 6 months. Further studies are needed to conclusively determine the impact of OC and alcohol use on risk and their underlying biological pathways.

The availability of biopsy samples was a major strength of the study that provided the unique opportunity to apply gene expression-based classifiers to a population for which molecular data is rarely available. Most previous studies have used IHC staining to classify TP53 status,27,28,30,31 which misses many mutations that are not associated with TP53 protein overexpression.1,22,32 IHC classification methods, therefore, can be problematic when evaluating etiology of breast tumors with an aggressive phenotype because such tumors are more likely to carry TP53 mutation types that are not associated with protein overexpression (e.g., nonsense and frameshift mutations).1 Thus, RNA classification methods of TP53 may be preferred in etiologic studies, as they capture downstream transcriptional activity and are more sensitive to pathway changes caused by these diverse mutation types. In a US population, RNA-based methods were more sensitive than DNA or IHC methods for classifying TP53 mutant-like tumors, particularly among Black and younger women, who are more likely to be diagnosed with tumors that have aggressive features.12 Another key strength was the high participation rates of cases and controls (>90%), leading to greater internal generalizability of the study findings.

This study was not without limitations. Gene expression data was not available for all breast cancer cases. While this may reduce the precision of our estimates, it is not likely to impact the validity, as the characteristics of cases eligible for RNA extraction generally do not vary substantially from those without (Supplementary Table 2). Although frequency of alcohol use varied between these groups, the observed associations were similar when stratifying by tumor size (Supplementary Tables 7a-7b). The sample size required dichotomization of risk factors. Further, there were small numbers of subjects when cross-classifying cases by both tumor markers, which reduced our power to detect associations with the less common marker combinations (i.e., TP53 wildtype-like/non-luminal and TP53 mutant-like/luminal).

Much of the current understanding of breast cancer etiology among African women derives from studies in African Americans (for which the genetic structure represents a mixture of African and non-African ancestry). To the best of our knowledge, this is the first study of risk factor associations with TP53 subtypes in an indigenous African population. Such studies are important, as Black women are known to have a higher frequency of TP53 mutations and p53 protein expression compared to White women.33-37 There is also evidence of differences in TP53 mutation type by race, with a higher proportion of nonsense and indel mutations for Black compared to non-Black cases.12 Molecular epidemiology studies in unscreened populations, such as the GBHS, are valuable because the cases likely constitute a more accurate reflection of the natural history of breast cancer in Black women. It has been suggested that screening can interrupt the study of the natural history of breast cancer by uncovering tumors that would have otherwise not come to clinical attention due to their inherent biology.38,39

In sum, using high-quality RNA expression data, we have shown that RNA-based TP53 status and PAM50 intrinsic subtype are useful breast tumor markers for describing etiologic heterogeneity. Cross-classification of these markers further refines the subtype-specific risk factor associations, which in turn could inform potential mechanisms by which targeted risk reduction could be achieved. To further characterize heterogeneity of breast cancer phenotypes and improve understanding of etiologic mechanisms in African-ancestry populations, additional studies integrating data on tumor (e.g., histologic grade, Ki67) and tumor microenvironment (e.g., immune, inflammation, wound repair) markers will be required. As we showed in the current study, aggressive tumor characteristics, e.g., grade, TP53, and PAM50, are highly correlated. Given the preponderance of high-grade tumors (which often precludes further stratification), as well as the challenges of procuring molecular assays among women in sub-Saharan Africa, further studies are warranted to uncover cost-effective markers with sufficient dynamic range to allow the identification of epidemiologically and clinically relevant breast cancer subtypes in this population. Owing to the heterogeneity of genetic structure and exposure profiles across African populations, the results of this study will need to be considered together with those from future studies in populations of other African regions.

Supplementary Material

Supplementary Material

ACKNOWLEDGEMENTS

We are grateful to all the women who agreed to participate in the study and provided information and biospecimens. Without them this work would not be possible. This work was supported by the Intramural Research Program in the Division of Cancer Epidemiology and Genetics, the US National Institutes of Health (NIH), National Cancer Institute (NCI).

Abbreviations:

CI

confidence interval

ER

estrogen receptor

GBHS

Ghana Breast Health Study

IHC

immunohistochemistry

KATH

Komfo Anokye Teaching Hospital

KBTH

Korle Bu Teaching Hospital

OC

oral contraceptive

OR

odds ratio

PLH

Peace and Love Hospital

PR

progesterone receptor

RUVg

Remove Unwanted Variation

Footnotes

CONFLICT OF INTEREST

None declared.

DISCLAIMER

Previous presentation: Part of this work was presented at the 16th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved 2023.

ETHICS STATEMENT

All participants provided written informed consent. Our study was approved by the Special Studies Institutional Review Board of the National Cancer Institute (NCI; Rockville, MD, USA; FWA #: 00005897 and IORG #: 00010), the Ghana Heath Service Ethical Review Committee and Institutional Review Boards at the University of Ghana Noguchi Memorial Institute for Medical Research (Accra, Ghana; FWA #: 00001824 and IORG #: 0000908), the Kwame Nkrumah University of Science and Technology (Kumasi, Ghana) and the School of Medical Sciences at Komfo Anokye Teaching Hospital (Kumasi, Ghana).

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from the corresponding author upon reasonable request.

REFERENCES

  • 1.Comprehensive molecular portraits of human breast tumours. Nature. Oct 4 2012;490(7418):61–70. doi: 10.1038/nature11412 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Nielsen TO, Parker JS, Leung S, et al. A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer. Clin Cancer Res. Nov 1 2010;16(21):5222–32. doi: 10.1158/1078-0432.Ccr-10-1282 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wallden B, Storhoff J, Nielsen T, et al. Development and verification of the PAM50-based Prosigna breast cancer gene signature assay. BMC Med Genomics. Aug 22 2015;8:54. doi: 10.1186/s12920-015-0129-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hurson AN, Ahearn TU, Koka H, et al. Risk factors for breast cancer subtypes by race and ethnicity: a scoping review. J Natl Cancer Inst. Dec 1 2024;116(12):1992–2002. doi: 10.1093/jnci/djae172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Giaquinto AN, Sung H, Newman LA, et al. Breast cancer statistics 2024. CA Cancer J Clin. Nov-Dec 2024;74(6):477–495. doi: 10.3322/caac.21863 [DOI] [PubMed] [Google Scholar]
  • 6.Kong X, Liu Z, Cheng R, et al. Variation in Breast Cancer Subtype Incidence and Distribution by Race/Ethnicity in the United States From 2010 to 2015. JAMA Netw Open. Oct 1 2020;3(10):e2020303. doi: 10.1001/jamanetworkopen.2020.20303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Jatoi I, Sung H, Jemal A. The Emergence of the Racial Disparity in U.S. Breast-Cancer Mortality. N Engl J Med. Jun 23 2022;386(25):2349–2352. doi: 10.1056/NEJMp2200244 [DOI] [PubMed] [Google Scholar]
  • 8.Hercules SM, Alnajar M, Chen C, et al. Triple-negative breast cancer prevalence in Africa: a systematic review and meta-analysis. BMJ Open. May 27 2022;12(5):e055735. doi: 10.1136/bmjopen-2021-055735 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Onyia AF, Nana TA, Adewale EA, et al. Breast Cancer Phenotypes in Africa: A Scoping Review and Meta-Analysis. JCO Glob Oncol. Sep 2023;9:e2300135. doi: 10.1200/go.23.00135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tang W, Zhang F, Byun JS, et al. Population-specific Mutation Patterns in Breast Tumors from African American, European American, and Kenyan Patients. Cancer Res Commun. Nov 7 2023;3(11):2244–2255. doi: 10.1158/2767-9764.Crc-23-0165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Martini R, Delpe P, Chu TR, et al. African Ancestry-Associated Gene Expression Profiles in Triple-Negative Breast Cancer Underlie Altered Tumor Biology and Clinical Outcome in Women of African Descent. Cancer Discov. Nov 2 2022;12(11):2530–2551. doi: 10.1158/2159-8290.Cd-22-0138 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hurson AN, Abubakar M, Hamilton AM, et al. TP53 Pathway Function, Estrogen Receptor Status, and Breast Cancer Risk Factors in the Carolina Breast Cancer Study. Cancer Epidemiol Biomarkers Prev. Jan 2022;31(1):124–131. doi: 10.1158/1055-9965.Epi-21-0661 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Joko-Fru WY, Jedy-Agba E, Korir A, et al. The evolving epidemic of breast cancer in sub-Saharan Africa: Results from the African Cancer Registry Network. Int J Cancer. Oct 15 2020;147(8):2131–2141. doi: 10.1002/ijc.33014 [DOI] [PubMed] [Google Scholar]
  • 14.Figueroa JD, Davis Lynn BC, Edusei L, et al. Reproductive factors and risk of breast cancer by tumor subtypes among Ghanaian women: A population-based case-control study. Int J Cancer. Sep 15 2020;147(6):1535–1547. doi: 10.1002/ijc.32929 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Brinton LA, Awuah B, Nat Clegg-Lamptey J, et al. Design considerations for identifying breast cancer risk factors in a population-based study in Africa. Int J Cancer. Jun 15 2017;140(12):2667–2677. doi: 10.1002/ijc.30688 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Geczik AM, Falk RT, Xu X, et al. Measured body size and serum estrogen metabolism in postmenopausal women: the Ghana Breast Health Study. Breast Cancer Res. Jan 26 2022;24(1):9. doi: 10.1186/s13058-022-01500-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Nanostring. nCounter breast cancer 360 panel. https://www.nanostring.com/products/gene-expression-panels/gene-expression-panels-overview/ncounter-breast-cancer-360-panel [Google Scholar]
  • 18.Risso D, Ngai J, Speed TP, Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol. Sep 2014;32(9):896–902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hamilton AM, Hurson AN, Olsson LT, et al. The landscape of immune microenvironments in racially-diverse breast cancer patients. Cancer Epidemiology, Biomarkers & Prevention. 2022;doi: 10.1158/1055-9965.epi-21-1312 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bhattacharya A, Hamilton AM, Furberg H, et al. An approach for normalization and quality control for NanoString RNA expression data. Brief Bioinform. Aug 13 2021;22(3)(bbaa163):bbaa163. doi: 10.1093/bib/bbaa163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Parker JS, Mullins M, Cheang MC, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. Mar 10 2009;27(8):1160–7. doi: 10.1200/jco.2008.18.1370 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Troester MA, Herschkowitz JI, Oh DS, et al. Gene expression patterns associated with p53 status in breast cancer. BMC Cancer. Dec 6 2006;6:276. doi: 10.1186/1471-2407-6-276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Allott EH, Cohen SM, Geradts J, et al. Performance of Three-Biomarker Immunohistochemistry for Intrinsic Breast Cancer Subtyping in the AMBER Consortium. Cancer Epidemiol Biomarkers Prev. Mar 2016;25(3):470–8. doi: 10.1158/1055-9965.Epi-15-0874 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bastien RR, Rodríguez-Lescure Á, Ebbert MT, et al. PAM50 breast cancer subtyping by RT-qPCR and concordance with standard clinical molecular markers. BMC Med Genomics. Oct 4 2012;5:44. doi: 10.1186/1755-8794-5-44 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Wang L, Li Q, Aushev VN, et al. PAM50- and immunohistochemistry-based subtypes of breast cancer and their relationship with breast cancer mortality in a population-based study. Breast Cancer. Nov 2021;28(6):1235–1242. doi: 10.1007/s12282-021-01261-w [DOI] [PubMed] [Google Scholar]
  • 26.Abubakar M, Guo C, Koka H, et al. Clinicopathological and epidemiological significance of breast cancer subtype reclassification based on p53 immunohistochemical expression. NPJ Breast Cancer. 2019;5:20. doi: 10.1038/s41523-019-0117-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Begg CB, Zabor EC, Bernstein JL, Bernstein L, Press MF, Seshan VE. A conceptual and methodological framework for investigating etiologic heterogeneity. Stat Med. Dec 20 2013;32(29):5039–52. doi: 10.1002/sim.5902 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Benefield HC, Zabor EC, Shan Y, Allott EH, Begg CB, Troester MA. Evidence for Etiologic Subtypes of Breast Cancer in the Carolina Breast Cancer Study. Cancer Epidemiol Biomarkers Prev. Nov 2019;28(11):1784–1791. doi: 10.1158/1055-9965.Epi-19-0365 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Furberg H, Millikan RC, Geradts J, et al. Reproductive factors in relation to breast cancer characterized by p53 protein expression (United States). Cancer Causes Control. Sep 2003;14(7):609–18. doi: 10.1023/a:1025682410937 [DOI] [PubMed] [Google Scholar]
  • 30.Ma H, Wang Y, Sullivan-Halley J, et al. Use of four biomarkers to evaluate the risk of breast cancer subtypes in the women's contraceptive and reproductive experiences study. Cancer Res. Jan 15 2010;70(2):575–87. doi: 10.1158/0008-5472.Can-09-3460 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.van der Kooy K, Rookus MA, Peterse HL, van Leeuwen FE. p53 protein overexpression in relation to risk factors for breast cancer. Am J Epidemiol. Nov 15 1996;144(10):924–33. doi: 10.1093/oxfordjournals.aje.a008862 [DOI] [PubMed] [Google Scholar]
  • 32.Williams LA, Butler EN, Sun X, et al. TP53 protein levels, RNA-based pathway assessment, and race among invasive breast cancer cases. NPJ Breast Cancer. 2018;4:13. doi: 10.1038/s41523-018-0067-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Huo D, Hu H, Rhie SK, et al. Comparison of Breast Cancer Molecular Features and Survival by African and European Ancestry in The Cancer Genome Atlas. JAMA Oncol. Dec 1 2017;3(12):1654–1662. doi: 10.1001/jamaoncol.2017.0595 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Jones BA, Kasl SV, Howe CL, et al. African-American/White differences in breast carcinoma: p53 alterations and other tumor characteristics. Cancer. Sep 15 2004;101(6):1293–301. doi: 10.1002/cncr.20500 [DOI] [PubMed] [Google Scholar]
  • 35.Keenan T, Moy B, Mroz EA, et al. Comparison of the Genomic Landscape Between Primary Breast Cancer in African American Versus White Women and the Association of Racial Differences With Tumor Recurrence. J Clin Oncol. Nov 1 2015;33(31):3621–7. doi: 10.1200/jco.2015.62.2126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Martin DN, Boersma BJ, Yi M, et al. Differences in the tumor microenvironment between African-American and European-American breast cancer patients. PLoS One. 2009;4(2):e4531. doi: 10.1371/journal.pone.0004531 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Porter PL, Lund MJ, Lin MG, et al. Racial differences in the expression of cell cycle-regulatory proteins in breast carcinoma. Cancer. Jun 15 2004;100(12):2533–42. doi: 10.1002/cncr.20279 [DOI] [PubMed] [Google Scholar]
  • 38.Zahl PH, Gøtzsche PC, Mæhlen J. Natural history of breast cancers detected in the Swedish mammography screening programme: a cohort study. Lancet Oncol. Nov 2011;12(12):1118–24. doi: 10.1016/s1470-2045(11)70250-9 [DOI] [PubMed] [Google Scholar]
  • 39.Zahl PH, Maehlen J, Welch HG. The natural history of invasive breast cancers detected by screening mammography. Arch Intern Med. Nov 24 2008;168(21):2311–6. doi: 10.1001/archinte.168.21.2311 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

RESOURCES