Abstract
Single-nucleus analysis allows robust cell-type classification and helps to establish relationships between chromatin accessibility and cell-type-specific gene expression. Here, using samples from 92 women of several genetic ancestries, we developed a comprehensive chromatin accessibility and gene expression atlas of the breast tissue. Integrated analysis revealed ten distinct cell types, including three major epithelial subtypes (luminal hormone sensing, luminal adaptive secretory precursor (LASP) and basal-myoepithelial), two endothelial and adipocyte subtypes, fibroblasts, T cells, and macrophages. In addition to the known cell identity genes FOXA1 (luminal hormone sensing), EHF and ELF5 (LASP), TP63 and KRT14 (basal-myoepithelial), epithelial subtypes displayed several uncharacterized markers and inferred gene regulatory networks. By integrating breast epithelial cell gene expression signatures with spatial transcriptomics, we identified gene expression and signaling differences between lobular and ductal epithelial cells and age-associated changes in signaling networks. LASP cells and fibroblasts showed genetic ancestry-dependent variability. An estrogen receptor-positive subpopulation of LASP cells with alveolar progenitor cell state was enriched in women of Indigenous American ancestry. Fibroblasts from breast tissues of women of African and European ancestry clustered differently, with accompanying gene expression differences. Collectively, these data provide a vital resource for further exploring genetic ancestry-dependent variability in healthy breast biology.
Subject terms: Translational research, Breast cancer
Through sequencing of 88,005 nuclei from the breast tissues of clinically healthy women of diverse genetic ancestry, a global breast single-nucleus atlas was developed that identifies distinct cell types and ancestry-level differences linked to epithelial and fibroblast cell states, which in turn could influence disease incidence, molecular subtypes and progression.
Main
Breast cancer shows genetic ancestry-dependent variability in incidence, molecular subtypes and outcomes1,2. For example, breast cancer is diagnosed at an earlier age and is more likely to be triple-negative breast cancer subtype in women of African ancestry compared to women of European ancestry3,4. Although socioeconomic stress and healthcare access are probably major drivers of disparity in breast cancer outcomes, genetic ancestry-dependent differences in healthy breast and breast cancer biology are strong contributors4–7. There is evidence in the literature for genetic ancestry-specific patterns of genomic architecture and differences in mutational spectrum in cancer suggesting the influence of genetic ancestry on changes in genome organization and chromatin accessibility8–11.
To understand the complex biology of breast cancers, several groups have used single-cell technologies to develop single-cell atlases of the breast12–16. We described 23 epithelial cell states, including eight basal-myoepithelial (BM), three intermediate basal-luminal (BL) progenitor, eight luminal progenitor (recently renamed luminal adaptive secretory precursor (LASP) cells) and four mature luminal (recently renamed luminal hormone sensing (LHS) cells) cell states using breast tissue biopsies from clinically healthy donors16. However, there is a lack of information on differences in cell state based on genetic ancestry and the relationship between cell states as defined by transcriptome and chromatin accessibility status.
In this study, we performed single-nucleus multiome analyses of breast tissues from women of diverse genetic ancestry to assess gene expression and chromatin accessibility simultaneously in the same cells. Epithelial cell gene expression signatures derived from the multiome assay were integrated with spatial transcriptomics data to identify genes enriched in lobular epithelial and ductal epithelial cells. Our studies suggest an impact of genetic ancestry on epithelial and fibroblast cell states in the healthy breast, which could influence disease incidence, molecular subtypes and progression.
Results
Single-nucleus chromatin accessibility and transcriptome mapping
Extended Data Fig. 1 provides a brief overview of the experimental design and additional details. Figure 1a provides genetic ancestry estimates of the donor samples used in the study. Additional donor information is shown in Supplementary Table 1. Extended Data Table 1 provides details of the number of nuclei sequenced in each group and the average number of genes identified per nuclei per group. Integrated analysis of single-nucleus assay for transposase-accessible chromatin with sequencing (snATAC-seq) and single-nucleus RNA sequencing (snRNA-seq) data from all ancestries revealed ten distinct major clusters (Fig. 1b). Epithelial cell types were annotated using CD49f and EpCAM markers into mature luminal, luminal progenitor and basal cells16, clusters previously described with various names12,13,17. We used the markers described in that study to subcluster epithelial cells and found the presence of all six epithelial cell states (Fig. 1c). Further examination of these data revealed that LASP cells exist in four cell states, fibroblasts in two states, endothelial cells in four states and macrophages in two states (Fig. 1d). The heatmaps in Fig. 1e,f summarize the differentially expressed genes (DEGs) in all cell clusters and epithelial cell clusters, respectively. Comparison between the genes enriched in cell types based on our multiome analysis with data reported by others showed substantial overlap for the markers of individual cell types: ERBB4, ANKRD30A, AFF3, TTC6, MYBPC1 and THSD4 in LHS cells; KRT15, ELF5, CCL28 and KIT in LASP cells; KLHL29 in BM cells; LAMA2 and SLIT2 in fibroblasts; MECOM, LDB2, ST6GALNAC3, STNL9, AL357507.1, PKHD1L1 and MMRN1 in endothelial cells; PTPRC, SKAP1, ARHGAP15 and THEMIS in T cells12; KLHL13 in basal-alpha (BAα) cells; ELF5 in LASP_alveolar progenitor (AP) cells; ESR1 in LHS_hormone sensing α (HSα) cells; CD96 in T cells; RUNX1T1 in fibroblasts; ALCAM in macrophages; and MECOM in endothelial cells13. The expression pattern of selected genes enriched in specific epithelial subtypes, a few not previously reported, is shown in Extended Data Fig. 2. These include BM cell-enriched FHOD3, LASP cell-enriched BARX2 and NCALD, and LHS cell-enriched CTNND2, DACH1, INPP4B and NEK10.
Extended Data Fig. 1. Experimental workflow of single nucleus atlas generation.
Twelve major steps that were used in creation of single nucleus atlas of breast tissues are shown.
Fig. 1. Integrated snATAC-seq and snRNA-seq analyses of breast issues of healthy women.
a, Genetic ancestry marker distribution pattern among donors of self-identified ethnicity groups. b, Integrated cell clusters generated using snATAC-seq and snRNA-seq data representing all donors except women of African ancestry. Adi, adipocytes; Endo, endothelial. c, Breast epithelial cells could be further subclassified into six different cell types: BM_BAα, BM_BAβ, LASP alveolar progenitor, LASP_BL, LHS_HSα and LHS_HSβ. d, Cell clustering analyses revealed further refinement of cell state of LASP cells, fibroblasts and endothelial cells. e, Heatmap of top DEGs in each cell type shown in c. f, Heatmap of the top DEGs in the epithelial subtypes. g, Top regulons identified by SCENIC in each epithelial subtype using integrated snATAC-seq and snRNA-seq data.
Extended Data Table 1.
Number of donor tissues, nuclei and transcripts per nuclei in each group
Extended Data Fig. 2. Expression pattern of epithelial subtypes identity gene.
Expression pattern of LHS, LASP and BM cell identity genes is shown. These genes have not been previously reported to be expressed in specific subtypes of breast epithelial cells.
A complete list of genes expressed in each of the cell types are included in Supplementary Table 2. Genes differentially expressed in HSα compared to HSβ cells, AP versus BL cells, and BAα versus basal-beta (BAβ) cells are listed in Supplementary Table 3. Genes differentially expressed in the different cell states of endothelial cells, fibroblasts and adipocytes are listed in Supplementary Table 4. The marker genes of each cell type are listed in Supplementary Table 5.
We used SCENIC to infer the top regulons based on downstream gene expression in three major epithelial cell types18. SCENIC identified 71 gene regulatory networks, of which 66 significantly differed across the three epithelial subsets (one-way analysis of variance (ANOVA), P value threshold of 1 × 10−5) (Supplementary Table 6). A heatmap showing the top regulons of BM, LASP and LHS cells is shown in Fig. 1g. Using Signac, a computational tool that allows DNA motif analysis of snATAC-seq data19, we identified transcription factor binding site motifs that were differentially enriched in chromatin-accessible regions in each of the cell types (Extended Data Fig. 3a). Distinct motifs were differentially enriched in several cell types and the expression levels of a few of the transcription factors whose binding motifs were enriched in specific epithelial cell types are shown in Extended Data Fig. 3b. For example, the GHRL-1 transcription factor is expressed at a higher level in LASP cells; the binding site for this transcription factor is enriched in LASP_AP cells. By contrast, although SOX10 is expressed in both BM and LASP cells, SOX10 binding motifs are enriched only in BM_BAβ cells. Transcription factor binding site motifs uniquely enriched in epithelial cells are shown in Extended Data Fig. 3c. Footprinting analyses using Signac revealed an expected lack of Tn5 integration into regions that harbored binding motifs for ZBTB14, THRB and TCLF5 but enrichment of integration flanking motifs of ZBTB14 and TCFL5 (Extended Data Fig. 3d). These results revealed previously uncharacterized gene regulatory networks in the epithelial cells of the healthy breast.
Extended Data Fig. 3. DNA binding motif analyses using Signac.
a) DNA binding motifs differentially active in every cell type of the breasts are shown. b) Expression patterns of select transcription factors whose DNA binding motifs are enriched in epithelial subtypes. c) DNA binding motifs differentially active in epithelial cell types. d) Footprinting analyses show lack of Tn5 integration in regions that carry epithelial cell specific motifs. e) Representative immunohistochemistry images of breast tissues stained with antibodies against ERα (n=17), FOXA1 (n = 18) and GATA3 (n = 20). Nuclei in ducts and lobules analyzed has been marked.
Markers of LHS, LASP and BM cells of the breast
Previous studies in hematopoietic stem cells showed chromatin accessibility correlating with gene expression, particularly for gene regulatory programs that influence cellular state and identity20. To determine whether similar regulatory mechanisms exist in the healthy breast, we used chromatin accessibility map and gene expression data to determine which, among the previously suggested cell identity genes of LHS, LASP and BM cells, showed compatible cell-type-specific chromatin accessibility and gene expression. For LHS cells, we focused mainly on ESR1, FOXA1 and GATA3 because ERα-FOXA1-GATA3 constitute a cell-type-specific transcription factor network of hormone-responsive cells21,22. ELF5, EHF and KIT were considered as markers of LASP cells, whereas TP63 and KRT14 were considered as markers of BM cells.
ESR1 expression was generally limited to LHS cells (Fig. 2a). However, the ESR1 gene displayed similar chromatin-accessible regions in both LHS and LASP cells (Fig. 2b) except for one peak (peak 1) being more prominent in LHS cells. Estrogen receptor (ER) binding sites were present in the chromatin-accessible regions of LHS and LASP cells but at lower levels in BM cells, although differences were less dramatic (Fig. 2c).
Fig. 2. FOXA1, EHF, ELF5, TP63 and KRT14 show epithelial subtype-enriched expression and chromatin accessibility.
a, ESR1 expression pattern in several cell types of the breast. Avg. exp., average expression. b, ESR1 gene chromatin accessibility patterns in LHS, LASP and BM cells. The horizontal red arrow marks the direction of transcription from the indicated gene. The vertical arrow denotes cell-type-specific chromatin-accessible regions. The number of accessible regions is indicated (1–5). c, ESR1 binding site enrichment pattern in chromatin-accessible regions. d, FOXA1 was expressed mostly in LHS cells. e, FOXA1 gene chromatin accessibility patterns in LHS, LASP and BM cells. f, FOXA1 binding site enrichment pattern in chromatin-accessible regions of several cell types. g, GATA3 was expressed in all three major epithelial subtypes, with prominent expression in LHS cells. h, GATA3 gene chromatin accessibility patterns in LHS, LASP and BM cells. i, GATA3 binding site enrichment pattern in chromatin-accessible regions of several cell types. j, EHF was expressed mostly in LASP cells. k, EHF gene chromatin accessibility patterns in LHS, LASP and BM cells. l, EHF binding site enrichment pattern in chromatin-accessible regions of several cell types. m, ELF5 was expressed mostly in LASP cells. n, ELF5 gene chromatin accessibility patterns in LHS, LASP and BM cells. o, ELF5 binding site enrichment pattern in chromatin-accessible regions of several cell types. p, Expression pattern of KIT in several cell types. q, KIT gene chromatin accessibility in LHS, LASP and BM cells. r, TP63 was expressed mostly in BM cells. s, TP63 gene chromatin accessibility patterns in LHS, LASP and BM cells. t, TP63 binding site enrichment pattern in chromatin-accessible regions of several cell types. u, KRT14 was expressed predominantly in BM cells. v, KRT14 gene chromatin accessibility patterns in LHS, LASP and BM cells. w, Positivity patterns of nuclei in ducts and lobules in the breast tissues of healthy donors for ERα (n = 17), FOXA1 (n = 18) and GATA3 (n = 20), as determined using IHC. Positivity scores were compared using a one-way ANOVA, followed by Tukey’s test for multiple comparisons. Data are presented as the mean ± s.d. *P = 0.0453; ***P = 0.0002. NS, not significant (P = 0.2018). Samples are biologically independent.
FOXA1 expression was mostly restricted to LHS cells (Fig. 2d). The promoter region of FOXA1 was more accessible in LHS cells compared to other cell types (Fig. 2e). Furthermore, the chromatin-accessible regions of LHS cells were enriched for binding sites for FOXA1 compared to other cell types (Fig. 2f). GATA3 expression, chromatin accessibility and binding site enrichment did not show specificity to LHS cells, with expression extending to LASP and BM cells (Fig. 2g–i). Therefore, based on expression and chromatin accessibility, FOXA1, not ESR1 and GATA3, is the appropriate marker of LHS cells. Furthermore, the binding sites for the FOX family of transcription factors are enriched in LHS cells (Extended Data Fig. 2a).
EHF expression was enriched in the LASP cells compared to other cell types (Fig. 2j). Although distinct chromatin accessibility patterns for EHF were observed in three major epithelial cell types of the breast, we found more prominent chromatin-accessible promoter regions in LASP cells, followed by LHS cells (Fig. 2k). The binding sites for EHF were enriched in the open chromatin regions of LASP cells (Fig. 2l). ELF5 expression, chromatin accessibility and binding site enrichment mirrored that of EHF (Fig. 2m–o). Despite enrichment of KIT expression in LASP cells, its promoter regions showed similar accessibility across all three cell types (Fig. 2p,q).
The BM cell marker TP63 showed enriched expression and chromatin accessibility in BM cells (Fig. 2r,s). TP63 binding sites were also enriched in the chromatin-accessible regions of BM cells compared to other cell types (Fig. 2t). Similarly, KRT14 expression was restricted to BM cells (Fig. 2u); its promoter showed a distinct accessibility pattern in BM cells (Fig. 2v). These results suggest that TP63 and KRT14 are bona fide markers of BM cells.
To determine if some of our multiome data could be reproduced at protein levels via an independent method, we calculated the percentage of nuclei in the ducts and lobules in the breast tissues of 20 donors (ten of African ancestry and ten of European ancestry) that were positive for ERα, FOXA1 and GATA3 using immunohistochemistry (IHC)23. After outlier identification and removal (Methods), we observed higher positivity scores for GATA3, followed by ERα, with FOXA1 being the least expressed protein in breast tissue ducts and lobules (Fig. 2w and Extended Data Fig. 3e). The observed increase in the enrichment of GATA3+ cells correlated with its relatively widespread expression in multiple cell types compared to FOXA1, which was LHS-specific. The expression of ERα and FOXA1 was similar, which was consistent with LHS-specific expression of both genes in the breast tissues of women of African and European ancestry. Similar studies with markers of LASP (EHF and ELF5) and BM (TP63 and KRT14), and other newly described markers of LHS cells, are required to determine whether a simple IHC-based method can be adapted to quantitatively measure the LHS:LASP:BM cell ratio in the ducts and lobules of the breast.
Differences between epithelial cells of breast ducts and lobules
Breast carcinomas arising from the ducts and lobules show distinct tumor biology, suggesting that epithelial cells in these regions show inherent differences in gene expression24. Having defined breast epithelial cell-enriched gene expression, we next used spatial transcriptomics to identify gene expression differences between epithelial cells of the ducts and lobules at two different time points (10 years apart) from the same donors (Extended Data Fig. 4). Microdissected ducts and lobules, as well as adipocytes, are shown in Extended Data Fig. 4d. Extended Data Fig. 4e shows the distribution pattern of specific epithelial subtypes in the ducts and lobules. The volcano plot in Fig. 3a shows differences in gene expression between ductal and lobular epithelial cells across all samples. The heatmap on the right (Fig. 3a) shows genes differentially expressed in the ducts and lobules of individual samples. The volcano plots shown in Fig. 3b,c further highlight key genes differentially expressed in ductal compared to lobular epithelial cells at time points 1 and 2 across all samples. The volcano plot in Fig. 3d shows differences in gene expression in ducts between time points 1 and 2, whereas Fig. 3e shows differences in gene expression in lobules between time points 1 and 2.
Extended Data Fig. 4. Spatial transcriptomics to determine differences in gene expression between ductal and lobular breast epithelial cells.
a) UMAP showing differences in gene expression patterns between timepoint 1 and timepoint 2. b) Age and BMI of donors at two timepoints of tissues collected for spatial transcriptomics are also indicated. c) Staining pattern of breast tissues with antibodies against pan-keratin, FABP4 and smooth muscle actin. N = 10. d) Representative regions of interest related to ducts, lobules and adipocytes selected for RNA extraction and sequencing. N = 10. e) Deconvolution of spatial transcriptomics data show elevated Adi-2, macrophages and Endo-2 at timepoint 2 compared to timepoint 1 in most samples.
Fig. 3. Spatial transcriptomics reveal gene expression differences between ductal and lobular epithelial cells.
a, Left: volcano plot showing gene expression differences between lobular and ductal epithelial cells of all samples combined. Right: heatmap showing genes differentially expressed in the ductal and lobular epithelial cells of individual donors. Each sample is in pairs from the same donors collected twice 10 years apart. b, Gene expression differences between ductal and lobular epithelial cells of all samples combined at time point 1 (n = 3). Those on the left were enriched in lobular epithelial cells, whereas those on the right were enriched in ductal epithelial cells. c, Gene expression differences in ductal and lobular epithelial cells of all samples combined at time point 2 (n = 5). d, Gene expression differences in ductal epithelial cells of sample number 3 between time points 1 and 2. Those on the left were enriched at time point 1, whereas those on the right were enriched at time point 2. e, Gene expression differences in lobular epithelial cells of sample number 3 between time points 1 and 2. The statistical significance of spatial transcriptomics data was calculated with the R package ImerTest using the least squares means method. f, DUSP1, DPM3 and RPL36 were expressed at a higher level in lobular carcinomas compared to ductal carcinomas of the breast. The TCGA dataset was used for this analysis. Statistical significance was derived using an unpaired t-test. Samples used were biologically independent. DUSP1 (infiltrating ductal carcinoma (IDC): n = 784; low 8.803, first quartile (Q1) 49.273; median 89.037, third quartile (Q3) 149.567; high 359.698; infiltrating lobular carcinoma (ILC): n = 203; low 6.165, Q1 110.845; median 206.213, Q3 349.685; high 793.39), DPM3 (IDC: n = 784; low 6.872, Q1 85.851; median 117.469, Q3 167.43; high 321.213; ILC: n = 203; low 25.637, Q1 104.213; median 143.894, Q3 202.602; high 369.928), RPL36 (IDC: n = 784; low 390.827, Q1 1506.467; median 2001.37, Q3 2669.19; high 4675.837; ILC: n = 203; low 758.972, Q1 2072.142; median 2648.138, Q3 3373.037; high 5322.998).
Although the distribution pattern of the three major epithelial subtypes between ducts and lobules did not differ (Extended Data Fig. 4e), 426 genes were differentially expressed in ductal versus lobular epithelial cells. Ductal epithelial cells expressed higher levels of KRT14 and KRT17, which are BM cell-enriched genes, compared to lobular epithelial cells, which is consistent with a previous report12. The IGHA1 and IGKC genes were expressed at higher levels in lobular compared to ductal epithelial cells. The other lobular epithelial cell-enriched genes included DUSP1, DPM3 and RPL36 (Fig. 3a). Analysis of The Cancer Genome Atlas (TCGA) data using the UALCAN database25 revealed higher DUSP1, DPM3 and RPL36 expression in lobular carcinoma compared to ductal carcinoma of the breast (Fig. 3f). Comprehensive lists of genes differentially expressed in lobular epithelial cells compared to ductal epithelial cells across all samples, as well as differences at time point 1 and at time point 2, are provided Supplementary Table 7.
A recent study using a different spatial transcriptomics technique listed 30 genes that are expressed differentially in ductal and lobular epithelial cells12. Ten of these 30 genes showed similar expression patterns in our study. MGP, ANXA1, TACSTD2, KRT14, KRT17, WFDC2, STAC2 and ALDH1A3 were elevated in ductal epithelial cells whereas APOD and SNORC were elevated in lobular epithelial cells. The expression pattern of these ten genes assessed through multiome data is shown in Extended Data Fig. 5a. Pathway analysis of genes differentially expressed in ductal and lobular epithelial cells and at two different time points showed enrichment of metabolic pathways in ductal epithelial cells, mitogen-activated protein kinase kinase and paired amphipathic helix protein Sin3a–histone deacetylase complex signaling in lobular epithelial cells (Extended Data Fig. 5b).
Extended Data Fig. 5. Gene expression and signaling differences between epithelial cells of ducts and lobules.
a) Expression pattern of 10 genes that showed differential expression in ductal epithelial cells compared to lobular epithelial cells assessed using multiome data. b) Differences in signaling pathways in ductal and lobular epithelial cells. Data from all samples were used to generate these networks. c) PTBP1 whose expression in normal breast epithelial cells was reduced in timepoint 2 compared to timepoint 1, is overexpressed in all breast cancer subtypes compared to normal breast. Statistical significance was derived using Unpaired t-test. Samples are biologically independent. (Normal: N = 114, low- 55.146, First quartile (Q1)-87.064, median- 109.154, Third quartile (Q3) - 123.208, high- 163.066; Luminal: N = 566, low- 85.382, q1- 138.168, median- 159.404, q3- 180.133, high- 242.444; HER2 positive: N = 37, low- 105.775, q1- 122.596, median- 132.549, q3- 148.043, high- 188.8; TNBC Basal-like 1: N = 13, low- 152.31, q1- 166.83, median- 182.37, q3- 206.14, high- 220.45; TNBC Basal-like 2: N = 11, low- 119.54, q1- 161.645, median- 179.97, q3- 210.075, high- 217.12; TNBC Immunomodulatory: N = 20, low- 123.85, q1- 139.06, median- 155.46, q3- 179.92, high- 242.18; TNBC luminal androgen receptor: N = 8, low- 123.99, q1- 129.368, median- 136.68, q3- 142.92, high- 153.48; TNBC mesenchymal stem-like: N = 8, low- 96.39, q1- 129.857, median- 154.925, q3- 177.99, high- 203.06; TNBC Mesenchymal: N = 29, low- 75.03, q1- 140.838, median- 165.18, q3- 200.85, high- 260.73; TNBC unspecified: N = 27, low- 100.17, q1- 144.165, median- 167.22, q3- 193.375, high- 264.97).
With respect to age-dependent changes, PTBP1, which is associated with mRNA processing and alternative splicing26, showed consistent decline (approximately sixfold) in expression at time point 2 compared to time point 1 in both ductal and lobular epithelial cells (Fig. 3d,e). PTBP1 was overexpressed in all subtypes of breast cancer (Extended Data Fig. 5c). Epithelial cell genes whose expression is reduced with age (168 genes, P < 0.05) were associated with the protein kinase A (PKA) signaling pathway, while those whose expression increased with age (183 genes) were involved in the eukaryotic initiation factor 2 (eIF2) and oxidative phosphorylation pathways (Extended Data Fig. 6a,b and Supplementary Table 7).
Extended Data Fig. 6. Age-dependent signaling pathway alterations in ductal and lobular epithelial cells of the breast.
Genes differentially expressed in ductal and lobular epithelial cells at timepoint 2 compared to timepoint 1 from sample #3 were subjected to Ingenuity Pathway Analysis. a) EIF2 signaling pathway enrichment with age. b) Oxidative phosphorylation pathway enrichment with age.
Genetic ancestry and germline mutation-dependent differences
We next analyzed data based on genetic ancestry groups and BRCA1 and BRCA2 mutation status. Cell types in each group are shown in Fig. 4a and cell proportions are listed in Supplementary Table 8. AP cells were disproportionately higher in the breast tissues of women of Indigenous American ancestry. While 19% of cells in Indigenous Americans were AP cells, the percentage of AP cells in the other groups ranged from 5% to 9%. None of these differences between the groups was due to differences in age, body mass index (BMI) or the number of childbirths (Supplementary Table 1 and Methods). Furthermore, none of these differences can be attributed to differences in the proliferation rate of cells in the breast at the time of tissue collection as MKI67, a clinically used marker of cell proliferation27, was expressed in very few cells across samples (Fig. 4b).
Fig. 4. Genetic ancestry-dependent variability in cell state.
a, Cell clustering in each group based on integrated snATAC-seq and snRNA-seq analyses. b, Expression pattern of the cell proliferation marker MKI67. c, ESR1 expression showed genetic ancestry-dependent variability with a subpopulation of LASP cells in Indigenous Americans expressing ESR1. d, ESR1 gene chromatin accessibility patterns in LHS, LASP and BM cells of various genetic ancestry groups, and BRCA1 and BRCA2 mutation carriers. e, FOXA1 expression and FOXA1 gene chromatin accessibility patterns in several genetic ancestry groups, and BRCA1 and BRCA2 mutation carriers. f, GATA3 expression and chromatin accessibility patterns in several genetic ancestry groups, and BRCA1 and BRCA2 mutation carriers. g, ELF5 expression and chromatin accessibility patterns in several genetic ancestry groups, and BRCA1 and BRCA2 mutation carriers. The red vertical box shows a chromatin-accessible peak unique to the cells of the BRCA2 mutation carrier. h, EHF expression and chromatin accessibility patterns in several genetic ancestry groups, and BRCA1 and BRCA2 mutation carriers.
Among the genes expressed at higher levels in LHS, LASP or BM cells, ESR1 showed genetic ancestry-dependent variability in expression with a unique population of LASP cells enriched in the Indigenous American population expressing ESR1 (Fig. 4c). To examine whether these differences in ESR1 expression correlated with changes in chromatin accessibility, ESR1 gene chromatin accessibility in LHS, LASP and BM cells in individual genetic ancestry groups were examined (Fig. 4d). Expression of ESR1 in the LASP cells of Indigenous Americans did not correlate with dramatic changes in chromatin accessibility. Expression and the chromatin accessibility patterns of FOXA1, GATA3, ELF5, EHF, KRT14 and TP63 also did not show any genetic ancestry-dependent variability (Fig. 4e–h and Extended Data Fig. 7a).
Extended Data Fig. 7. Chromatin accessibility and expression patterns of BM cell-enriched markers.
a) Expression and chromatin accessibility pattern of KRT14 and TP63 in various genetic ancestry and BRCA1/2 mutation carriers. b) Signaling pathways uniquely active in alveolar progenitor cells enriched in Indigenous Americans. Legend within the figure provides details of relationship between molecules of the signaling network.
To determine the characteristics of AP cells enriched in Indigenous Americans, we compared gene expression between cluster 9 (Indigenous American enriched cluster) and clusters 1, 12 and 18 within LASP cells (Supplementary Table 3). As expected, ESR1 was among the top genes highly expressed (∼230 fold higher) in this cluster compared to other clusters. To determine whether genes expressed in cluster 9 were regulated by ESR1, we subjected genes differentially expressed in cluster 9 compared to clusters 1, 12 and 18 to Ingenuity Pathway Analysis. Among the several pathways enriched in this cluster, the intersection of EGF signaling with ER signaling emerged as one of the top signaling networks (Extended Data Fig. 7b). EGF was previously shown to alter the ER cistrome and induce gene expression patterns found in antiestrogen-resistant cells28. Thus, in cluster 9, an ER and EGF signaling cross-talk may be dominant.
Comparative analysis of breast tissues of women of African and European ancestry
Our multiple attempts to perform integrated snRNA-seq and snATAC-seq of breast tissues of women of African ancestry were not successful (Methods) but we were eventually successful with snRNA-seq alone. To allow appropriate comparison, the breast tissues of women of European ancestry underwent snRNA-seq alone. Cell clustering was similar to the clustering patterns obtained with integrated snATAC-seq and snRNA-seq (Fig. 5a). However, there were major differences in epithelial and fibroblast cell states between the breast tissues of women of African and European ancestry. For example, LASP cell clusters in African ancestry were dominantly populated by the BL cell state, whereas in women of European ancestry, there were similar numbers of BL and AP cells. Like the data presented in Fig. 4, ESR1 and FOXA1 expression was restricted to LHS cells in both groups (Fig. 5b). Unlike the breast tissues of Indigenous Americans, AP cells did not express higher levels of ESR1 compared to other LASP cell states. The differences between cell clusters of women of African and European ancestry were not due to differences in proliferation rate because there was only a minor difference in MKI67+ cells between the two groups (Fig. 5c).
Fig. 5. Comparative analyses of the breast tissues of women of African ancestry with women of European ancestry using snRNA-seq.
a, Fibroblasts and epithelial cells of the breast tissue cluster were different in women of African ancestry compared to women of European ancestry. b, ESR1 and FOXA1 expression patterns in epithelial cell clusters of women of African and European ancestry. As with the multiome data, FOXA1 expression was restricted to LHS cells and ESR1 expression was higher in LHS cells compared to LASP cells in both groups. c, MKI67 expression patterns in the breast tissues of women of African and European ancestry. d, PROCR, ZEB1 and PDGFRα expression patterns in the breast tissues of women of African and European ancestry. e, Fibroblasts in women of African and European ancestry showed distinct cell states. f, The fibro-prematrix state was dominant in African ancestry, while the fibro-matrix state was more prominent in European ancestry. g, Genetic ancestry-dependent and germline mutation-dependent variability in the clustering of fibroblasts.
We recently reported a unique type of stromal cells with multipotent activity called PROCR+/ZEB1+/PDGFRα+ cells enriched in the breast tissues of women of African ancestry7. Because fibroblasts in the breast tissues of women of African ancestry clustered differently from those of European ancestry (Fig. 5a), we analyzed the expression patterns of PROCR, ZEB1 and PDGFRα. A subpopulation of cells in the fibroblast clusters were positive for all three markers (Fig. 5d, indicated by a circle in Fig. 5a).
Fibroblasts in the healthy breast exist in four different states: fibro-prematrix, fibro-SFRP4, fibro-major and fibro-matrix12. There were genetic ancestry-dependent variabilities in fibroblast state, with women of African ancestry showing specific enrichment of genes that specify the fibro-prematrix state at the expense of the fibro-matrix state (Fig. 5e,f). Extended Data Fig. 8 provides additional differences in fibroblasts between women of African and European ancestries. Gene expression differences between the fibroblasts of women of African ancestry and women of European ancestry are shown in Supplementary Table 9. When the fibroblast populations of other ancestry groups and BRCA1 and BRCA2 mutation carriers were compared, there was considerable variation in fibroblast cell state across genetic ancestry and BRCA1 and BRCA2 mutation status (Fig. 5g).
Extended Data Fig. 8. Genetic ancestry dependent variability in expression of fibroblast-enriched genes.
a) Differences in expression of fibroblast-enriched genes in breast tissue fibroblasts of African ancestry compared to European ancestry. Fourteen clusters (0-13) are shown in Fig. 5g of the main text. b) Expression levels of genes that classify fibroblasts into four subtypes are also shown.
Chromatin accessibility pattern of cell identity genes
Because previous studies showed compatible gene expression and the chromatin accessibility of cell identity genes in hematopoietic stem cells20, we next examined whether genes previously described to identify a cell type show cell-type-specific chromatin accessibility with a focus on endothelial and T cells29. CD4+ T cell-enriched IL7R and CD8+ T cell-enriched IFNγ showed T cell-specific expression and chromatin accessibility (Fig. 6a). Similarly, the expression and chromatin accessibility of CD8+ T cell-enriched GZMK was restricted to T cells (Fig. 6b). The chromatin accessibility of macrophage-enriched FCGR3A was restricted to macrophages (Fig. 6c).
Fig. 6. Gene expression and chromatin accessibility patterns of selected cell identity genes.
a, IL7R and IFNγ expression and chromatin accessibility were restricted to T cells. b, GZMK expression and chromatin accessibility were restricted to T cells. c, FCGR3A expression and chromatin accessibility were restricted to macrophages. d, The lymphatic endothelial marker LYVE1 was expressed in the endothelial cell 2 subcluster and a fraction of macrophages, but the chromatin accessibility patterns were not unique to these two cell types. e, Although ACKR1 expression was restricted to a subpopulation of endothelial cells, the ACKR1 gene showed limited variation in chromatin accessibility between different cell types. f, CXCL12 expression and chromatin accessibility showed limited correlation.
Among endothelial cells, the lymphatic endothelial cell marker LYVE1 was expressed in the endothelial cell 2 subcluster and in the macrophage subpopulation, but the chromatin accessibility patterns in these two cell types were not similar (Fig. 6d). Similarly, the endothelial stalk-like subtype marker ACKR1 was expressed in a subset of the endothelial cell 1 subcluster but the chromatin in the regulatory regions of this gene was accessible in all cell types (Fig. 6e). CXCL12, which is expressed in the endothelial cell 1 subcluster and in fibroblasts, showed similar chromatin-accessible patterns in all cell types except for T cells (Fig. 6f). Thus, unlike in the hematopoietic system20, the chromatin accessibility and gene expression of cell identity genes were not uniformly compatible.
Comparative analyses of healthy breast with breast cancer single-cell transcriptome
We previously reported the enrichment of genes expressed in LHS cells in most breast cancers16. We updated this analysis by overlapping the gene expression signatures obtained from the multiome assay with previously described single-cell data of breast tumors29. This previous study identified five breast cancer gene modules (LumA, LumB, basal, HER2 and cycling). We observed the LumA, LumB and HER2 gene modules overlapping with the gene signatures of LHS cells, while the cycling module overlapped with the gene signature of LASP cells (Extended Data Fig. 9a). The basal breast cancer module overlapped with BM cells. These results further support the theory that LHS cells are the cells of origin of the LumA, LumB and HER2 subtypes of breast cancers.
Extended Data Fig. 9. Relationship between breast epithelial gene signatures derived from this study with gene signatures derived from single cell analysis of breast tumors.
a) Gene signature of LHS cells overlap with gene expression modules of LumA, LumB and HER2+ breast cancers, whereas gene signatures of LASP and BM cells overlap with gene expression of modules of cancer cycling and cancer basal, respectively. b) Expression patterns of genes that identify myCAFs, iCAFs, dPVLs and iPVLs among fibroblast subclusters.
Recent studies described four major stromal cells in breast cancer with fibroblast and pericyte-like properties30,31. These include myofibroblasts, inflammatory fibroblasts, differentiated pericytes and immature pericytes. Myofibroblasts are enriched for COL1A1 and PDPN, whereas inflammatory fibroblasts express higher levels of CD34 and CXCL12. MYH11 and MYLK are enriched in differentiated pericytes, whereas CD36 and RGS5 are enriched in immature pericytes. Based on the expression patterns of these genes, at least three subtypes of fibroblasts exist in the healthy breast (Extended Data Fig. 9b). Additionally, based on the expression patterns of MYH11 and MYLK, clusters 5 and 11 shown in Fig. 5g are probably enriched for differentiated pericytes.
Discussion
Based on reports highlighting genetic ancestry-dependent variability in the utility of disease biomarkers, mutation patterns in cancer, response to treatment and outcomes, there has been an increased recognition of the importance of including tissues from people of different genetic ancestry to make research representative of all people8,32–35. By sequencing 88,005 nuclei from the breast tissues of healthy women of diverse genetic ancestry, we developed a global breast single-nucleus atlas. Our results differ from other similar studies, which used reduction mammoplasty or healthy tissues adjacent to tumors, which we and others showed to be histologically abnormal23,36,37. Based on our current results, we propose two major genetic ancestry-dependent differences in alveolar progenitor cells of the healthy breast in Indigenous Americans and stromal fibroblasts in women of African ancestry. Alveolar progenitor cells enriched in Indigenous Americans expressed higher levels of ESR1, which potentially impacts estrogen signaling in the breast tissues of these women. The fibroblast subtype enriched in women of African ancestry could influence the microenvironment in the breast.
With regard to cancer outcome disparity, previous studies highlighted differences in genomic aberrations in cancer cells and in immune cells that infiltrate tumors based on genetic ancestry6,8,38. Our results suggest genetic ancestry-dependent differences in stromal cells as a potential contributor to this disparity. The fibroblast subcluster described in this study showed distinct cell states in women of African Ancestry compared to others. The fibroblast cell state in women of African ancestry was predominantly prematrix compared to others; this cell state is associated with vasculogenesis12. How this fibroblast state impacts healthy biology and breast tumor biology is to be investigated.
Several of the previously reported cell identity genes of BM, LASP and LHS cells, including TP63, KRT14, EHF, ELF5 and FOXA1 showed similar expression patterns in our study12,13. In addition, several other genes were identified to be uniquely expressed in these epithelial subtypes. Among these genes, deregulated activity or expression of the LHS-enriched cell fate factor DACH1, negative regulator of PIK3CA–Akt pathway gene INPP4B, and the p53 activator NEK10 were linked to breast cancer39–41. We also identified the enrichment of two new transcription factor DNA binding motifs in breast epithelial cells. One is TCFL5, a testis-specific basic helix-loop-helix transcription factor expressed at low levels in the healthy breast but undergoing genomic aberration, mostly amplification, in 7% of breast cancers42,43. Another is ZBTB14 (also known as ZFP161); chromosomal loss involving this gene (chr18p11.31) is linked to poor prognosis in LumB breast cancers44. Additional studies are needed to determine the contribution of these factors in healthy and breast cancer biology.
Breast cancers originating from epithelial cells of the duct and lobules show distinct biology, clinical features and response to treatment24. Therefore, there has been considerable interest in deciphering biological differences in epithelial cells of the ducts and lobules. The results of our spatial transcriptomics study identified few major differences between these two epithelial cell types. Predominant expression of ALDH1A3 in ductal epithelial cells is interesting in the context of stem-progenitor-mature cell hierarchy of the breast45. ALDH1A3 is expressed predominantly in LASP cells from which most breast cancer in BRCA1 mutation carriers is suggested to originate46,47. Ductal epithelial cells also expressed higher levels of KRT14 and KRT17, which are expressed mostly in BM cells. We also noted that three genes (DUSP1, DPM3 and RPL36) expressed at higher levels in lobular epithelial cells compared to ductal epithelial cells were elevated in lobular carcinoma compared to ductal carcinoma. Lobular epithelial cells that intrinsically express higher levels of DUSP1, DPM3 and RPL36 compared to ductal cells could be cells of origin of lobular cancer carcinomas.
Identifying age-dependent changes in breast epithelial cell gene expression has been of interest to several groups because any deviation in this process may predispose breast epithelial cells to breast tumorigenesis. Previous studies, however, used breast tissues from different individuals for such studies13,48. Despite the small sample size, our study using breast tissues from the same donor enabled mapping of gene expression changes in ductal and lobular epithelial cells with age. PTBP1 expression was reduced at time point 2 compared to time point 1 in both ductal and lobular epithelial cells. PTBP1 is involved in mRNA biosynthesis and alternative splicing and its downregulation can have a major impact on the cellular proteome26. Pathway analysis of genes commonly differentially expressed in lobular and ductal epithelial cells revealed downregulation of the PKA pathway but upregulation of the eIF2 pathway with aging. Studies in yeast models showed that attenuating the PKA pathway extends the lifespan, while translational inhibition through phosphorylation of eIF2 increases the lifespan49,50. As the PKA and eIF2 pathways are also involved in breast tumorigenesis51,52, whether deregulation of natural aging-dependent changes in these pathways contributes to breast tumorigenesis is an unanswered question. Collectively, the data presented in this study provide an important resource generated from healthy breast tissues to derive cell-type-specific chromatin accessibility and age-dependent gene expression signatures to study several diseases of the breast.
Limitations of the study
This study used breast tissues from healthy donors and all tissues were collected using biopsies of the upper outer quadrant of the breast. Therefore, potential breast region-specific differences were not captured. Because biopsies yield a limited number of nuclei often because of adiposity, the number of nuclei analyzed per genetic ancestry group was somewhat limited. While this study provides a framework for future studies, analyses of additional samples from donors with information on genetic ancestry estimates, including region-specific variations (for example, West and East Africa) and groups not covered in this study, would help to develop a much more robust and globally informative human cell atlas.
Methods
Healthy breast tissues and genetic ancestry mapping
Breast tissue biopsies from healthy donors were cryopreserved till use as described previously53. All tissues used in this study were from female donors and female donors were selected because breast cancer is more common in women than men. Tissues from men currently available in our tissue bank are insufficient for the types of studies performed. DNA from the blood of tissue donors was used for genetic ancestry mapping. The procedures for genetic ancestry mapping are described in our recent publication7. Samples for the multiome assay were initially grouped based on self-reported ethnicity and subsequently analyzed for genetic ancestry estimates using the previously described highly discriminative ancestry informative marker panel54. Supplementary Table 1 provides details of the donors, including menopausal status, age, BMI, self-reported ethnicity and days since ovulation. Average age, BMI and the number of childbirths in each subgroup are also provided in this table. The average ages of European, Indigenous American, Asian and Hispanic White donors were 38, 41, 42 and 41, respectively. The average ages of Ashkenazi Jewish-European and African American donors were higher (average of 53 and 54, respectively). Most self-reported White donors were enriched for European ancestry, whereas most self-reported Black donors were enriched for African ancestry markers. Sixty percent of self-reported Hispanic women are enriched for European ancestry markers. Fifty percent of Asian donors were enriched for East Asian ancestry while the other 50% were enriched for Southeast Asian ancestry markers. Indigenous Americans had a higher proportion of ‘Americana’ ancestry markers compared to others; the Americana ancestry proportion in these donors was similar to the Indigenous American ancestry proportions described in Indigenous American breast cancer patients in Peru55. Ashkenazi Jewish Americans included a mixture of European, Middle Eastern and African ancestry. We also included six samples from five BRCA1 mutation donors (two breasts of the same donor in one case) and five samples from four BRCA2 mutation donors. Although limited in number, the purpose of including these samples was to allow comparison between noncarriers and mutation carriers under conditions where standard operating procedures, assay and analysis platforms are the same. The clinical information and BRCA mutation status of two BRCA1 and three BRCA2 donors are provided in our recent publications56. Mutations in the other BRCA1 samples include heterozygous c.1687C>T (p.Gln563*) and heterozygous c.4065_4068delTCAA (p.Asn1355LysfsX10). The mutation status of one donor was not available. The BRCA2 sample carried a pathogenic heterozygous c.5042_5043delTG (p.Val1681GlufsX7) mutation. Unlike tissues from clinically healthy donors, which were collected at tissue collection events, tissues from BRCA1 and BRCA2 mutation carriers were surgical specimens (tumor-adjacent or contralateral breast). Despite these tissues being graded as histologically normal by pathologists, they may exhibit abnormalities at the molecular level, as reported by us and others in the studies of tumor-adjacent normal tissues23,36,37,57.
Single-nucleus multiome assay
Tissues from two cores each of five donors were rapidly thawed and subjected to multiome assay reactions and sequencing. Assays were done in multiple batches for each group. Nuclei were isolated from these tissues using the protocol suggested for use with the 10X Genomics Chromium Next GEM Single Cell Multiome ATAC+Gene Expression protocol (CG000338). Thawed tissues were washed in PBS, minced with a scalpel and transferred to 1.5-ml microcentrifuge tubes. Then, 300 µl of NP-40 lysis buffer was added and tissues were homogenized 15× using a pellet pestle (catalog number 749625-0010, Thermo Fisher Scientific). After homogenization, 1 ml of NP-40 lysis buffer was added and incubated on ice for 3 min. Wide-bore pipettes were used to mix tissues intermittently during incubation to allow better disintegration. The lysed tissue suspension was first filtered through a 70-µM strainer followed by a 40-µM strainer into a 50-ml conical tube. After centrifugation for 5 min at 500g at 4 °C, most of the supernatant was removed leaving behind 50 µl; 1 ml of PBS + 1% BSA + 1 U µl−1 of RNase inhibitor was added and kept on ice for 5 min. After resuspension with a pipette, nuclei were centrifuged at 500g for 5 min at 4 °C. After supernatant removal, nuclei were resuspended in 500 µl of PBS + 1% BSA + RNase inhibitor. Then, 5 µl of 7-aminoactinomycin D (7-AAD) ready-made solution (catalog number SML1633-1ML, Sigma-Aldrich) was added to the nucleus suspension. 7-AAD+ nuclei were separated based on size and granularity using the BD FACSMelody cell sorter (or equivalent). The flow cytometry sorting patterns are provided in the source data file. After estimating the concentration of nuclei through manual counting under a microscope, the next steps were performed immediately according to the 10X Genomics Chromium Next GEM Single Cell Multiome ATAC + Gene Expression User Guide (CG000338). ATAC and complementary DNA libraries were prepared using the 10X Genomics protocol (CG000338). The final ATAC and gene expression libraries were sequenced on an Illumina NovaSeq 6000 sequencer, with index reads of 10 + 24 bp, and 100 bp paired-end reads.
Similar studies were performed with breast tissues from 35 donors of African ancestry, but high-quality and reliable results could not be obtained, probably because of debris that could not be removed from the nucleus preparation even after sorting nuclei using flow cytometry. This issue is unique to tissues from African donors, despite using the same operating procedure for collecting tissues from all donors. The failure rate with tissues from donors of other genetic ancestry was less than 20%. However, we were able to perform snRNA-seq of ten African ancestry donors. Additional tissues from donors of European ancestry were subjected to snRNA-seq alone to allow comparison between donors of African and European ancestry.
Single-nucleus multiome data analysis
Cell Ranger ARC v.2.0 (http://support.10xgenomics.com/) was used to process the raw sequence data derived from the single-nucleus multiome libraries. Both the ATAC and gene expression FASTQ files were processed with the cellranger-arc count algorithm. The paired information of the gene expression unique molecular identifiers and the count of transposition events in the peaks for each barcode was used to identify cells from the non-cell populations. The final filtered gene–cell barcode matrices and fragment files were used for further analysis with Signac19 and Seurat (v.4)58–60. As each sample was a pool of cells from multiple individuals, souporcell61 was used for genotype-free demultiplexing and cells were assigned to their origins. Analysis with Signac and Seurat started with a quality check of the cells identified. From the gene expression data, low-quality cells or cells with extremely high or low numbers of detected genes and unique molecular identifiers were excluded. For the ATAC-seq data, cells with low signal enrichment around transcriptional start sites, extremely high or low numbers of reads fallen in the peaks detected were discarded. The gene expression data was normalized using sctransform62. The chromatin accessibility data were normalized by applying the term frequency inverse document frequency of records procedure. On completion of preprocessing and dimensionality reduction independently on the gene expression and chromatin accessibility data, the closest neighbors of each cell in the data were calculated based on a weighted combination of gene expression and chromatin accessibility similarities. The weighted nearest neighbor graph was used for cell clustering and visualization.
To annotate each cell population from the analysis, automatic annotation using SingleR together with manual annotation with known marker genes were used. The CoveragePlot function from Signac was used to plot chromatin accessibility for specific genomic regions. The accessibility map included in the motif analyses spanned 10-kb regions upstream of the transcription start site to cover the promoter regions and 10 kb downstream of the transcription end site to cover potential 3′ enhancer regions.
Single-nucleus data analysis
snRNA-seq data were first processed using CellRanger v.7.0.1. The feature-cell barcode matrices generated from CellRanger were used for further analysis with Seurat (v.4)58–60. The integrated single-nucleus multiome data were used as a reference to annotate the snRNA-seq data. The FeaturePlot_scCustom function in scCustomize63 was used to generate the gene expression plots. Regulon and DNA motif enrichments in each cell type were determined using the SCENIC18 and Signac19 tools. The position weight matrices of the human transcription factor binding motifs were obtained from the CORE collection of JASPAR 2020 (ref. 64). Signac was used to determine motif positions in accessible chromatin regions using the FindMotifs() function19. The enrichment scores of the motifs in the accessible regions were calculated for each metacell using chromVAR65. This was done with the function RunChromVAR() from Signac. Signac’s MotifPlot function with default parameters was used to visualize the different motif sequences.
The transcription factor footprint analysis was performed using the Footprint function from Signac (v.1.12.0)19. Briefly, we first added the motif information, including the exact positions of each motif, to the peak matrix. Then, the normalized observed and expected Tn5 insertion frequency was calculated for each position surrounding a set of motif instances in the whole genome using the Footprint function. The footprinted motifs were plotted using the PlotFootprint function. For the footprint analysis, all footprint-supported motifs from the JASPAR 2020 database were used66.
Spatial transcriptomics analyses
Formalin-fixed paraffin-embedded sections from donors with two-time tissue donations were selected for the study. Each slide contained two 5-μm sections from the formalin-fixed paraffin-embedded blocks. Each slide represented one donor with two barcodes representing two-time donations. Each donor had three repeats (three slides per donor). The sections were cut with Leica DB80 LS blades (catalog number 14035843488, Leica Biosystems) on a rotary microtome instrument (catalog number RM2125 RTS, Leica Biosystems) and placed in the center of a Superfrost Plus Microscope slide (catalog number 1255015, Thermo Fisher Scientific). Tissue sections were placed in the center of the slide and were no larger than 35.3 × 14.1 mm2.
Regions of interest (ROIs) were selected after staining the slides with pan-keratin, alpha-smooth muscle actin and FABP4 antibodies. All ROIs passed a sequencing quality control assessment. Next, negative control probes were used to estimate background and downstream gene detection and to remove outliers. The limit of quantification (LOQ) of each ROI was calculated using the geometric mean and geometric standard deviation of the negative control probes to identify genes detected above background in the experiment. All ROIs passed the LOQ-based filtering with more than 1% of genes detected. Gene filtering was also performed, resulting in 10,270 remaining targets that were detected above the LOQ in 10% or more ROIs. A total of 10,270 genes remained for data analysis from 48 ROIs. Upper quartile (Q3) normalization was performed for the genes in each segment. Quality control and normalization was performed using GeoMxTools v.3.0.1.
Statistical analyses of the spatial transcriptomics data
Dimension reduction analysis was performed in R v.4.2.1 using the following packages: FactoMineR v.2.6, Rtsne v.0.16, R v.1.6.0, GAVA v.1.6.0 and UMAP v.0.2.9.0 (software v.1.0.0). Differential gene expression analysis was performed on a per-gene basis, modeling log-transformed, normalized gene expression using either a linear mixed-effect model (LMM) for study-wide comparisons or a linear model for donor-specific comparisons with GeoMxTools v.3.0.1. LMMs were used to account for the sampling of multiple ROI and area of interest segments per tissue and non-independence of the data. For the study-wide pairwise comparisons between the ducts and lobules, the following LMM was used: gene ∼ ROIType + (1|tissue). To compare ducts versus lobules within the same donor tissues, the following linear model was used: gene ∼ ROIType. A false discovery rate (FDR) correction was applied to P values. Statistical significance was calculated using the R package ImerTest using the least squares means method. Spatial deconvolution was performed using the SpatialDecon package in R (v.1.6.0). Spatial deconvolution requires the use of a cell profile matrix derived from scRNA-seq. For this analysis, we used gene signatures derived from the multiome data presented in Supplementary Tables 2 and 3. Differential abundance analysis was performed on the results of spatial deconvolution using the same approach as differential gene expression.
Pathway analysis was performed using the GSVA v.1.44.5 R package with the Kyoto Encyclopedia of Genes and Genomes Brite database; 796 gene sets were scored, where each gene set contained between 5 and 500 genes. Differential gene set enrichment analysis was performed on the results using the same approach as differential gene expression. Ingenuity Pathway Analysis was used to determine the pathways altered due to aging.
IHC
IHC was performed for FOXA1 (1:100 dilution, catalog number sc-6553, Santa Cruz Biotechnology), GATA3 (1:100 dilution, catalog number sc-268, Santa Cruz Biotechnology) and the ER clone EP1 (IR084, 15 µg ml−1, Dako) at the Indiana University Health Pathology Laboratory. Tissue microarray containing breast tissues of approximately 50 donors each of African and European ancestry and staining of this microarray with antibodies against the aforementioned markers was described previously23. The staining patterns in tissues of 20 of these donors were reanalyzed for the number of nuclei within ducts or lobules that expressed the protein of interest and excluded positive cells in the stroma. The slides were imaged using the Aperio ScanScope CS system. A board-certified pathologist evaluated the intensity and localization of the antibody staining using both light microscopy and whole-slide digital pathology images. Computer-assisted morphometric analysis of the digital images was performed using the Aperio Image Analysis software.
Statistical analyses of the IHC data
Statistical analysis of the IHC positivity scores was conducted using the Prism v.10.1.0 (GraphPad Software). Outlier identification and removal was done once using the recommended ROUT method on the software, with Q set to 0.5% (ref. 67). Using the ROUT method, we identified and removed three and two outliers from the ERα and FOXA1 datasets, respectively, and performed tests for statistical significance on the cleaned data. A one-way ANOVA was used to determine statistical significance across all three datasets, followed by a Tukey’s post hoc test.
Ethical approval
All breast tissues from women clinically free of breast cancer were collected by the Komen Normal Tissue Bank (KTB) with written informed consent. None of the donors were prisoners. Tissues used in study were from self-reported female donors. The study received ethical approval from the Indiana University institutional review board. International Ethical Guidelines for Biomedical Research Involving Human Participants were followed. The KTB website (https://komentissuebank.iu.edu) describes the standard operating procedure; breast biopsies were always collected from the upper outer quadrant of the breasts.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41591-024-03011-9.
Supplementary information
Supplementary files containing Excel spreadsheets with data on genes differentially expressed in several cell types of breast lobular and ductal epithelial cells and details of breast tissue donors.
Source data
Flow cytometry gating strategy for the isolation of nuclei from the breast tissues of different genetic ancestry groups.
Acknowledgements
We thank the countless number of women who donated normal breast tissues for research. We also thank the volunteers who facilitated this tissue collection. We offer special thanks to members of the KTB, including J. Henry, E. Nelson, M. Huynh, V. Rodriguez, A. Hughes, P. Rockey and J. Rose von Arx, as well as the Indiana University Simon Comprehensive Cancer Center (IUSCCC) tissue procurement facility for providing tissues and related data. We thank the flow cytometry core of IUSCCC for timely sorting of nuclei. We also thank D. Scoville of NanoString Technologies for processing the GeoMx data. H.N. acknowledges support for the research from the funders, the Catherine Peachey Fund and the Chan Zuckerberg Initiative Human Atlas Project. A.M.S. acknowledges funding from the Susan G. Komen Foundation to support the Susan G. Komen Tissue Bank at IUSCCC. The breast cancer research infrastructure at Indiana University School of Medicine is supported by the Vera Bradley Foundation for Breast Cancer Research.
Extended data
Author contributions
H.N. conceived and designed the study. P.B.-N., D.C., A.S.K., A.K.A., G.J., P.C.M., H.G., C.E., R.G., F.N., Y.L. and H.N. developed the methodology. P.B.-N., F.N., A.K.A., H.G., L.E. and G.S. acquired the data. P.B.-N., A.S.K., A.K.A., C.E., G.J., F.N., H.G., Y.L., G.S., A.M.S. and H.N. analyzed and interpreted the data. P.B.-N., D.C., R.G., H.G., F.N., A.K.A., A.M.S. and H.N. wrote, reviewed or revised the paper. A.M.S., Y.L. and H.N. provided administrative, technical or material support. H.N. and Y.L. supervised the study.
Peer review
Peer review information
Nature Medicine thanks Andrey Krokhotin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Sonia Muliyil, in collaboration with the Nature Medicine team.
Data availability
Most of the data are included in the paper. High-throughput data are available through the NCBI database with SuperSeries accession no. GSE244594. In addition, these data are publicly available through the CellXGene database of the Chan Zuckerberg Initiative. Source data are provided with this paper.
Code availability
No unique code was used in the study.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Change history
4/7/2025
A Correction to this paper has been published: 10.1038/s41591-025-03681-z
Extended data
is available for this paper at 10.1038/s41591-024-03011-9.
Supplementary information
The online version contains supplementary material available at 10.1038/s41591-024-03011-9.
References
- 1.Reeder-Hayes, K. E. & Anderson, B. O. Breast cancer disparities at home and abroad: a review of the challenges and opportunities for system-level change. Clin. Cancer Res.23, 2655–2664 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Dietze, E. C., Sistrunk, C., Miranda-Carboni, G., O’Regan, R. & Seewaldt, V. L. Triple-negative breast cancer in African-American women: disparities versus biology. Nat. Rev. Cancer15, 248–254 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Newman, L. A. & Kaljee, L. M. Health disparities and triple-negative breast cancer in African American women: a review. JAMA Surg.152, 485–493 (2017). [DOI] [PubMed] [Google Scholar]
- 4.Newman, L. A. et al. Meta-analysis of survival in African American and white American patients with breast cancer: ethnicity compared with socioeconomic status. J. Clin. Oncol.24, 1342–1349 (2006). [DOI] [PubMed] [Google Scholar]
- 5.Cho, B. et al. Evaluation of racial/ethnic differences in treatment and mortality among women with triple-negative breast cancer. JAMA Oncol.7, 1016–1023 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Martini, R. et al. African ancestry-associated gene expression profiles in triple-negative breast cancer underlie altered tumor biology and clinical outcome in women of African descent. Cancer Discov.12, 2530–2551 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kumar, B. et al. Stromal heterogeneity may explain increased incidence of metaplastic breast cancer in women of African descent. Nat. Commun.14, 5683 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Arora, K. et al. Genetic ancestry correlates with somatic differences in a real-world clinical cancer sequencing cohort. Cancer Discov.12, 2552–2565 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kachuri, L. et al. Gene expression in African Americans, Puerto Ricans and Mexican Americans reveals ancestry-specific patterns of genetic architecture. Nat. Genet.55, 952–963 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yuan, J. et al. Integrated analysis of genetic ancestry and genomic alterations across cancers. Cancer Cell34, 549–560 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jiang, Y.-Z. et al. Genomic and transcriptomic landscape of triple-negative breast cancers: subtypes and treatment strategies. Cancer Cell35, 428–440 (2019). [DOI] [PubMed] [Google Scholar]
- 12.Kumar, T. et al. A spatially resolved single-cell genomic atlas of the adult human breast. Nature620, 181–191 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gray, G. K. et al. A human breast atlas integrating single-cell proteomics and transcriptomics. Dev. Cell57, 1400–1420 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Murrow, L. M. et al. Mapping hormone-regulated cell–cell interaction networks in the human breast at single-cell resolution. Cell Syst.13, 644–664 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pal, B. et al. A single-cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. EMBO J.40, e107333 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bhat-Nakshatri, P. et al. A single-cell atlas of the healthy breast tissues reveals clinically relevant clusters of breast epithelial cells. Cell Rep. Med.2, 100219 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Reed, A. et al. A human breast atlas mapping the homestatic cellular shifts in the adult breast. Nat. Genet.56, 652–662 (2024). [DOI] [PMC free article] [PubMed]
- 18.Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat. Methods14, 1083–1086 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods18, 1333–1341 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Granja, J. M. et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet.53, 403–411 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Eeckhoute, J. et al. Positive cross-regulatory loop ties GATA-3 to estrogen receptor alpha expression in breast cancer. Cancer Res.67, 6477–6483 (2007). [DOI] [PubMed] [Google Scholar]
- 22.Zaret, K. S. & Carroll, J. S. Pioneer transcription factors: establishing competence for gene expression. Genes Dev.25, 2227–2241 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Nakshatri, H. et al. Genetic ancestry-dependent differences in breast cancer-induced field defects in the tumor-adjacent normal breast. Clin. Cancer Res.25, 2848–2859 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mouabbi, J. A. et al. Invasive lobular carcinoma: an understudied emergent subtype of breast cancer. Breast Cancer Res. Treat.193, 253–264 (2022). [DOI] [PubMed] [Google Scholar]
- 25.Chandrashekar, D. S. et al. UALCAN: a portal for facilitating tumor subgroup gene expression and survival analyses. Neoplasia19, 649–658 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Liu, H.-L. et al. The role of RNA splicing factor PTBP1 in neuronal development. Biochim. Biophys. Acta Mol. Cell. Res.1870, 119506 (2023). [DOI] [PubMed] [Google Scholar]
- 27.Nielsen, T. O. et al. Assessment of Ki67 in breast cancer: updated recommendations from the International Ki67 in Breast Cancer Working Group. J. Natl Cancer Inst.113, 808–819 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lupien, M. et al. Growth factor stimulation induces a distinct ERα cistrome underlying breast cancer endocrine resistance. Genes Dev.24, 2219–2227 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet.53, 1334–1347 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wu, S. Z. et al. Stromal cell diversity associated with immune evasion in human triple-negative breast cancer. EMBO J.39, e104063 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Cords, L. et al. Cancer-associated fibroblast classification in single-cell and spatial proteomics data. Nat. Commun.14, 4294 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Bergenstal, R. M. et al. Racial differences in the relationship of glucose concentrations and hemoglobin A1c levels. Ann. Intern. Med.167, 95–102 (2017). [DOI] [PubMed] [Google Scholar]
- 33.Nassar, A. H. et al. Ancestry-driven recalibration of tumor mutational burden and disparate clinical outcomes in response to immune checkpoint inhibitors. Cancer Cell40, 1161–1172 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.De Dominici, M. & DeGregori, J. Our ancestry dictates clonal architecture and skin cancer susceptibility. Nat. Genet.55, 1428–1429 (2023). [DOI] [PubMed] [Google Scholar]
- 35.Horwitz, R., Riley, E. A. U., Millan, M. T. & Gunawardane, R. N. It’s time to incorporate diversity into our basic science and disease models. Nat. Cell Biol.23, 1213–1214 (2021). [DOI] [PubMed] [Google Scholar]
- 36.Degnim, A. C. et al. Histologic findings in normal breast tissues: comparison to reduction mammaplasty and benign breast disease tissues. Breast Cancer Res. Treat.133, 169–177 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Teschendorff, A. E. et al. DNA methylation outliers in normal breast tissue identify field defects that are enriched in cancer. Nat. Commun.7, 10478 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Yao, S. et al. Breast tumor microenvironment in Black women: a distinct signature of CD8+ T-cell exhaustion. J. Natl Cancer Inst.113, 1036–1043 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Wu, K. et al. Cell fate factor DACH1 represses YB-1-mediated oncogenic transcription and translation. Cancer Res.74, 829–839 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Hamila, S. A., Ooms, L. M., Rodgers, S. J. & Mitchell, C. A. The INPP4B paradox: like PTEN, but different. Adv. Biol. Regul.82, 100817 (2021). [DOI] [PubMed] [Google Scholar]
- 41.Haider, N. et al. NEK10 tyrosine phosphorylates p53 and controls its transcriptional activity. Oncogene39, 5252–5266 (2020). [DOI] [PubMed] [Google Scholar]
- 42.Xu, W. et al. Transcription factor-like 5 is a potential DNA- and RNA-binding protein essential for maintaining male fertility in mice. J. Cell Sci.135, jcs259036 (2022). [DOI] [PubMed] [Google Scholar]
- 43.Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov.2, 401–404 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Addou-Klouche, L. et al. Loss, mutation and deregulation of L3MBTL4 in breast cancers. Mol. Cancer9, 213 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Liu, S. et al. Breast cancer stem cells transition between epithelial and mesenchymal states reflective of their normal counterparts. Stem Cell Reports2, 78–91 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Molyneux, G. et al. BRCA1 basal-like breast cancers originate from luminal epithelial progenitors and not from basal stem cells. Cell Stem Cell7, 403–417 (2010). [DOI] [PubMed] [Google Scholar]
- 47.Lim, E. et al. Aberrant luminal progenitors as the candidate target population for basal tumor development in BRCA1 mutation carriers. Nat. Med.15, 907–913 (2009). [DOI] [PubMed] [Google Scholar]
- 48.Shalabi, S. F. et al. Evidence for accelerated aging in mammary epithelia of women carrying germline BRCA1 or BRCA2 mutations. Nat. Aging1, 838–849 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Jiménez-Saucedo, T., Berlanga, J. J. & Rodríguez-Gabriel, M. Translational control of gene expression by eIF2 modulates proteostasis and extends lifespan. Aging13, 10989–11009 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Enns, L. C. & Ladiges, W. Protein kinase A signaling as an anti-aging target. Ageing Res. Rev.9, 269–272 (2010). [DOI] [PubMed] [Google Scholar]
- 51.Jewer, M. et al. Translational control of breast cancer plasticity. Nat. Commun.11, 2498 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Pattabiraman, D. R. et al. Activation of PKA leads to mesenchymal-to-epithelial transition and loss of tumor-initiating ability. Science351, aad3680 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Bhat-Nakshatri, P. et al. Acquisition, processing, and single-cell analysis of normal human breast tissues from a biobank. STAR Protoc.3, 101047 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Nievergelt, C. M. et al. Inference of human continental origin and admixture proportions using a highly discriminative ancestry informative 41-SNP panel. Investig. Genet.4, 13 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Marker, K. M. et al. Human epidermal growth factor receptor 2-positive breast cancer is associated with Indigenous American ancestry in Latin American women. Cancer Res.80, 1893–1901 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Bhat-Nakshatri, P. et al. Signaling pathway alterations driven by BRCA1 and BRCA2 germline mutations are sufficient to initiate breast tumorigenesis by the PIK3CAH1047R oncogene. Cancer Res. Commun.4, 38–54 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Jakubek, Y. A. et al. Large-scale analysis of acquired chromosomal alterations in non-tumor samples from patients with cancer. Nat. Biotechnol.38, 90–96 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Stuart, T. et al. Comprehensive integration of single-cell data. Cell177, 1888–1902 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol.36, 411–420 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell184, 3573–3587 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Heaton, H. et al. Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes. Nat. Methods17, 615–620 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol.20, 296 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Marsh, S., Salmon, M. & Hoffman, P. scCustomize: custom visualization & functions for streamlined analyses of single cell sequencing. R package version 2.1.2 https://cran.r-project.org/web/packages/scCustomize/index.html (2021).
- 64.Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res.48, D87–D92 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods14, 975–978 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Baranasic, D. JASPAR2020: data package for JASPAR database (v.2020). R package version 0.99.8 http://jaspar.genereg.net/ (2022).
- 67.Motulsky, H. J. & Brown, R. E. Detecting outliers when fitting data with nonlinear regression—a new method based on robust nonlinear regression and the false discovery rate. BMC Bioinformatics7, 123 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary files containing Excel spreadsheets with data on genes differentially expressed in several cell types of breast lobular and ductal epithelial cells and details of breast tissue donors.
Flow cytometry gating strategy for the isolation of nuclei from the breast tissues of different genetic ancestry groups.
Data Availability Statement
Most of the data are included in the paper. High-throughput data are available through the NCBI database with SuperSeries accession no. GSE244594. In addition, these data are publicly available through the CellXGene database of the Chan Zuckerberg Initiative. Source data are provided with this paper.
No unique code was used in the study.