Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

Research Square logoLink to Research Square
[Preprint]. 2023 Feb 7:rs.3.rs-2533579. [Version 1] doi: 10.21203/rs.3.rs-2533579/v1

Cell-specific and shared enhancers control a high-density multi-gene locus active in mammary and salivary glands

Lothar Hennighausen 1, Hye Kyung Lee 2, Michaela Willi 3, Chengyu Liu 4
PMCID: PMC9928059  PMID: 36789414

Abstract

Regulation of high-density loci harboring genes with different cell-specificities remains a puzzle. Here we investigate a locus that evolved through gene duplication1 and contains eight genes and 20 candidate regulatory elements, including a super-enhancer. Five genes are expressed in mammary glands and account for 50% of all mRNAs during lactation, two are salivary-specific and one has dual specificity. We probed the function of eight candidate enhancers through experimental mouse genetics. Deletion of the super-enhancer led to a 98% reduced expression of Csn3 and Fdcsp in mammary and salivary glands, respectively, and Odam expression was abolished in both tissues. The other three Casein genes were only marginally affected. Notably, super-enhancer activity requires the additional presence of a distal Csn3-specific enhancer. Our work identifies an evolutionary playground on which regulatory duality of a multigene locus was attained through an ancestral super-enhancer active in mammary and salivary tissue and gene-specific mammary enhancers.


The expansion of the secretory calcium-binding phosphoprotein (SCPP) gene family was key to the success of milk and saliva2,3, and its expression occurs uniquely in mineralized tissues, like mammary and salivary glands3. Yet, understanding enhancer structures, their specificities and interplay in complex genetic loci enabling distinct gene activities in mammary and salivary glands remains to be understood. We addressed this question experimentally in a locus harboring eight genes uniquely expressed in mammary or salivary glands or both (Fig. 1a)1,3,4. Duplication of Odam, the founding member of this family, followed by gene expansion led to a locus conserved in the mammalian lineage.

Figure 1. Characteristics of the Caseinlocus.

Figure 1

a, Diagram presents gene structure within the Casein locus and their preferential expression. b, mRNA levels of genes in the Casein locus were measured by RNA-seq in lactating mammary gland and salivary gland (n = 4 and 8, respectively). (C) mRNA levels of Csn genes were measured by RNA-seq at day six of pregnancy (p6) and lactation day one (L1) (n = 3 and 4, respectively). d, Genomics feature of the Casein locus was identified by ChIP-seq data DNase-seq data in lactating mammary gland and salivary gland. e,ChIP-seq data for TFs and histone markers provided structural information on the Casein locus at L1. Red, yellow and purple circles are marked putative enhancers, promoters, and CTCF binding sites, respectively. Super-enhancer is indicated by yellow rectangle.

Gene activity was measured through RNA-seq conducted on lactating mammary tissue and salivary tissue (Fig. 1b). Combined, the mRNA levels of the five Caseins account for more than 50% of mRNA in mammary tissue, and up to 106 reads were recorded for individual Casein genes. Casein expression in salivary tissue was four orders of magnitude lower. While expression of Fdcsp is confined to salivary tissue, Odam mRNA levels are equivalent in both mammary and salivary tissue. Casein mRNA levels increase approximately 100-fold between day 6 of pregnancy (p6) and day 1 of lactation (L1) (Fig. 1c, Supplementary Table 1) and at least 104-fold between non-parous (virgin) and day 10 lactating tissue (L10) (Supplementary Fig. 1a) (Lee et al., accompanying manuscript), making the respective genes ideal targets to understand gene activation by pregnancy hormones. The shared mammary-salivary locus also provides a unique opportunity to gain insight into control mechanisms operative in different organs. Candidate regulatory elements were identified based on H3K27ac patterns and transcription factor (TF) binding (Fig. 1de). As anticipated H3K27ac marks at the five Casein genes were restricted to mammary tissue. However, H3K27ac coverage upstream of Odam, the ancestral gene of the locus, was found in both mammary and salivary tissue, pointing to shared regulatory elements operative in both tissues. This region also harbors a 147 bp long evolutionarily conserved region (ECR) identified in mammals4.

Digging deeper, we explored binding of TFs known to activate other mammary genes5, such as the cytokine-inducible transcription factor Signal Transducer and Activator of Transcription (STAT5), the glucocorticoid receptor (GR), Nuclear Factor IB (NFIB) and mediator complex subunit 1 (MED1) (Fig. 1e). TF binding coincided with H3K27ac and H3K4me1 marks and a total of 20 putative regulatory elements were identified within the 330 kb locus. The H3K27ac marked region upstream of Odam contains four STAT5 bound regions and has all hallmarks of a super-enhancer (Fig. 1e, area highlighted in red). RNA polymerase II (Pol II) coverage was most prominent at the Casein genes. Integration of chromatin structures and gene expression data suggest that the mammary-salivary locus harbors the highest density of candidate enhancers and highly regulated genes among all multi-gene loci in mammary tissue (Supplementary Fig. 1b-d).

Although all 20 candidate regulatory elements were bound by STAT5, a principal TF controlling mammary development and function6, only 12 STAT5 peaks coincided with genuine DNA binding motifs (GAS, interferon-Gamma Activated Sequence) (Supplementary Fig. 2), suggesting STAT5 binding at other sites through alternative TFs, such as NFIB or GR (Fig. 1e). An inherent diversity in anchor proteins could lead to seemingly identical enhancers, with possibly different and unique activities. STAT5 binding occurred at the promoter regions of Csn1s1, Csn2, Csns2a and Csn1s2b and coincided with bona fide H3K4me1 enhancer marks7,8 (Supplementary Fig. 2). As expected, H3K4me3 marks were exclusively associated with promoter regions and Pol II coverage was preferentially over gene bodies (Supplementary Fig. 2).

To identify dual regulatory elements controlling genes in mammary and salivary glands, we focused on the candidate SE located between Csn1s2b and Odam, the ancestral gene expressed in both tissues. We generated mice carrying individual and combinatorial deletions of the four constituent enhancers (E) (Fig. 2a). All analyses were conducted in mammary tissue after a full pregnancy, thus monitoring hormone-induced gene activation. Deletion of the entire 10 kbp SE (ΔSE) was confirmed by the absence of TF binding and H3K27ac marks in this region (Fig. 2b). Although we had hypothesized that the entire shared locus would be under SE control, we observed distinct gene-specific differences. While Odam mRNA levels were reduced by more than 99%, Csn3 by 98% and Csn1s2b by 93%, Csn1s1 was reduced by a mere 50% (Fig. 2c, Supplementary Table 2) which coincided with a decline of H3K27ac, H3K4me3, and Pol II coverage at these genes (Fig. 2d, Supplementary Fig. 3). In contrast, Csn2 and Csn1s2a mRNA levels were reduced at a lower, yet statistically significant, level. This experiment demonstrates that the SE dictates expression of three genes but has a limited impact on the other three mammary genes in the shared locus, whose regulation might be controlled by gene-specific enhancers.

Figure 2. Differential activation of selected Casein genes by the super-enhancer during pregnancy.

Figure 2

a, The putative super-enhancer was identified by ChIP-seq for TFs and activating histone marks at L1. Diagram shows the enhancer deletions introduced in mice using CRISPR/Cas9 genome editing. b, ChIP-seq analysis shows the genomic structure of super-enhancer in lactating mammary tissue of WT and ΔSE mice. The orange shades indicate enhancers. c, Expression of Casein genes were measured in pregnancy day 18 (p18) mammary tissue from WT and ΔSE mice by RNA-seq (n = 4). d, STAT5, GR, H3K27ac and Pol lI landscape at the Casein locusin WT and ΔSE tissue during pregnancy was identified by ChIP-seq.

To understand whether the SE is required for the establishment of gene-specific enhancers, we conducted additional ChIP-seq experiments. STAT5, GR and NFIB binding at the two candidate Csn3 enhancers remained intact in the absence of the SE (Fig. 2d, Supplementary Fig. 3), suggesting the absence of compensatory activity. Notably, H3K27ac at the Csn3 candidate enhancers depends on the SE and not the gene-specific enhancers (Fig. 2d). In contrast, STAT5 binding at the Csn1s2b proximal enhancer is lost in ΔSE mammary tissue suggesting that the SE activates this secondary gene-specific enhancer.

While the 10 kbp SE region harbors four individual TF peaks, their ability to function individually or in combination and contribute to the overall SE activity is not clear. To gain insight into the complexity of this SE, we introduced individual and combined deletions (Supplementary Fig. 4). Deletion of E1 (ΔE1), E2 (ΔE2) or E4 (ΔE4) resulted in the loss of STAT5 binding and H3K27ac at their respective sites. Notably, the establishment of E3 and E4 is dependent on the presence of E2. While loss of E4 (ΔE4) had no discernible consequence on any of the Casein genes, Csn1s1, Csn1s2b and Csn3 mRNA levels were reduced between 40–70% in both ΔE1 and ΔE2 tissues. Combined deletion of both E1 and E2 (ΔE1/2) silenced the entire SE and mimicked the ΔSE mutation suggesting redundancy between E1 and E2.

A defining feature of milk protein gene is their exceptional response to pregnancy hormones, in particular prolactin. While the SE differentially affects Casein genes after a full pregnancy, it might have a more extended function in early pregnancy, prior to the prolactin surges that activate milk protein genes. Expression data obtained at p6 indicate an expanded SE function extending throughout the entire shared locus. In the absence of the SE (ΔSE), expression of Odam and all five Casein genes was reduced by more than 96% (Supplementary Fig. 5, Supplementary Table 3), suggesting SE affects initiation of Casein enhancers’ establishment and three genes that were less regulated in lactating ΔSE tissue have own enhancers established by increased hormone level during pregnancy.

Having identified the physiological significance of the SE in mammary tissue, we addressed its regulatory significance in salivary glands. While loss of SE activity led to a complete silencing of Odam expression, Fdcsp and Csn3 mRNA levels declined by 99% and 88%, respectively (Fig. 3, Supplementary Table 4), demonstrating its dual specificity.

Figure 3. Salivary specific activation of selected genes in the Casein locus by the super-enhancer.

Figure 3

a, Expression of Casein, Odam and Fdcspgenes was measured by RNA-seq in salivary tissue from ΔSE mice (n = 3). b, ChIP-seq analysis shows salivary landscape in WT and ΔSE mice. The red shade indicates the super-enhancer.

The absence of Fdcsp expression in mammary tissue might be the result of Odam blocking the SE from efficiently activating the Fdcsp promoter. To test this hypothesis, we deleted the Odam gene, thus transporting the SE within a few kbp to the Fdcsp gene (Supplementary Fig. 6, Supplementary Table 5). Despite being in the physical orbit of the SE, the Fdcsp remained silent in lactating mammary tissue and expression in salivary tissue was unaltered. These findings suggest that the promoter is unresponsive to the SE. In contrast, Csn3 mRNA levels increased approximately 2-fold in mammary tissue suggesting a distance-dependency of SE activity. No expression changes were observed in salivary glands.

Removal of the SE ablates Csn3 expression without impacting TF binding at the two Csn3 enhancers (Fig. 2), thus questioning their physiological roles. We addressed this issue and introduced deletions within the distal (E1) and proximal (E2) candidate enhancers (Fig. 4a). Strong STAT5 binding coinciding with two GAS motifs and an NFIB site occurred at E2, and binding at E1 was weaker. GR binding was stronger at E1 compared to E2, suggesting distinct molecular structures and possibly functions of the two enhancers. While deletion of the distal candidate enhancer (ΔE1) resulted in the loss of TF binding and H3K27ac at this site, Csn3 gene expression remained at 55% (Fig. 4bc). In contrast, deletion of the two GAS motifs in E2 (ΔE2-S) reduced Csn3 mRNA levels by 98%. While STAT5 binding was completely abolished, residual binding of NFIB was detected. Deletion of the two GAS motifs and the NFIB site (ΔE2-S/N) further reduced Csn3 mRNA levels, coinciding with a complete absence of TF binding, H3K27ac and Pol II loading (Fig. 4bc). Loss of the Csn3 enhancer did not adversely affect other Casein genes (Supplementary Fig. 7, Supplementary Table 6). The combined deletion of both enhancers, E1 and E2, (ΔCsn3-E1/2) did not further reduce gene activity (data not shown) suggesting no redundancy between them. Interactions between the Csn3 enhancers and SE had been confirmed by 3C, further supporting their crosstalk9.

Figure 4. Super-enhancer-dependent gene-specific enhancers activate Csn3 expression.

Figure 4

a, the presence of H3K27ac and H3K4me1 marks indicated a distal candidate enhancer (E1) at −7 kbp and a proximal one (E2) at −0.6 kbp. Diagram shows the enhancer deletions introduced in mice using CRISPR/Cas9 genome editing. b, Csn3 mRNA levels were measured by qRT-PCR in lactating mammary tissue from WT mice and mice lacking the Csn3 distal enhancer (ΔE1) and Csn3 proximal enhancer (ΔE2) and normalized to Gapdh levels. Results are shown as the means ± s.e.m. of independent biological replicates (n = 3). One-way ANOVA followed by Dunnett’s multiple comparisons test was used to evaluate the statistical significance of differences between WT and each mutant mouse line. c, Genomic features of the Csn3 locus were investigated by ChIP-seq in lactating mammary tissue of WT, ΔE1, ΔE2-S and ΔE2-S/N mice. The highlighted orange shades indicate enhancers.

Expression of Csn1s1, positioned at the 5’ border of the Casein locus, is only marginally influenced by the distant SE, suggesting the existence of independent regulatory elements. H3K27ac marks located candidate regulatory elements at −11.5 kb (E1) and − 3.5 kb (E2) (Fig. 5a). Strong STAT5, GR and NFIB occupancy was detected at site E2 but less so at E1, which also had reduced H3K27ac coverage. STAT5 binding was also observed at the Csn1s1 promoter and coincided with a GAS motif at −100 bp. While deletion of the GR motif in E1 (DE1) resulted in the loss of GR and STAT5 binding at this site and reduced STAT5 binding at the promoter site, Csn1s1 expression at lactation was unimpaired (Fig. 5bc). In contrast, deletion of the NFIB sites in E2 (DE2) resulted in a 65% reduction of Csn1s1 expression and coincided with the loss of TF binding and H3K27ac at both enhancers. These findings demonstrate that the Csn1s1 enhancers have a very limited biological activity compared to the Csn3 enhancer described in this study. We propose that the promoter with the STAT5 site might be the principal regulator activating Csn1s1 expression during pregnancy.

Figure 5. Redundant and non-redundant functions of the super-enhancer and Csn1s1 putative enhancers in the Csn1s1 expression.

Figure 5

a, the putative Csn1s1 enhancers were identified by ChIP-seq for TFs and activating histone marks at L1. Diagram shows the deletions introduced in the mouse genome using CRISPR/Cas9 genome editing. b, Csn1s1 mRNA levels in lactating mammary tissues from WT and mutant mice were measured by qRT–PCR and normalized to Gapdh levels. Results are shown as the means ± s.e.m. of independent biological replicates (n = 5). One-way ANOVA followed by Dunnett’s multiple comparisons test was used to evaluate the statistical significance of differences between WT and each mutant mouse line. c,The Csn1s1 locus was profiled using ChIP-seq in WT and mutant tissue.

Despite a wealth of studies5,1012, key questions pertaining the contribution of enhancers and SEs to gene regulation remain to be answered. Specifically, understanding the regulation of complex multi-gene loci harboring genes expressed in one or more distinct cell types is lacking. Our study provides insight into regulatory mechanisms operative in salivary and mammary glands, tissues that share morphological and molecular features during embryogenesis1315. Specifically, we identified a SE exclusively active in mammary and salivary tissue.

The shared locus with its eight genes linked to lactation, saliva and immune response1 is an evolutionary playground that fostered regulatory innovation and yielded 20 enhancer and promoter elements. Odam and its associated SE likely constitute the ancestral unit of the shared locus and regulatory activity expanded from salivary tissue to mammary tissue. However, as this locus expanded, the five newly formed Casein genes acquired their own regulatory elements and three gained independences of the SE. Although Csn1s2b9 and Csn3 acquired distal enhancers essential for their expression in lactating mammary tissue, they still retained their dependence on the SE. The enhancers linked to the five Casein genes display equivalent structures and TF occupancies suggesting the presence of additional elements that facilitate SE independence of three Casein genes. ChIP-seq analyses identified the presence of cytokine-response elements within the promoter regions of these three Casein genes and mutational analyses determined these as critical elements in at least the Csn2 gene (Lee et al., accompanying manuscript). Unlike the shared mammary-salivary locus, expression of the five globin genes in the a-globin locus is dependent on one SE composed of five individual erythroid enhancers12,16. Moreover, the two SE exhibit mechanistic differences, with enhancer redundancy in the mammary-salivary locus and additive activity in the a-globin locus.

While genetic studies have identified key transcription factors controlling mammary-specific gene expression through enhancers and SEs, there is limited knowledge about regulatory mechanisms operative in salivary tissue. Genome-wide histone modification studies in mouse submandibular glands have pointed to putative regulatory regions17, but no defined salivary-specific enhancers have been described. Key TFs controlling mammary function, such as STAT56,18 and ELF519 are dispensable in the salivary gland20. At this point the molecular backbone of the salivary enhancer, as defined by H3K27ac marks, extends over 10 kbp but no associated salivary-specific TF have been identified. Also, SE-induced gene activation in salivary tissue is lower than in mammary tissue.

The Fdcsp gene is silent in mammary tissue and uniquely activated in salivary glands by the SE suggesting a cell-preferential response of its promoter. This specificity is not influenced by distance of the SE or the presence of the intervening Odam gene, suggesting the presence of unique promoter elements that permit enhancer sensing. Alternatively, differential promoter accessibility21 in salivary and mammary tissue could account for the cell specificity.

Here we report on regulatory innovation in an evolutionary playground with genes acquiring mammary and salivary specificity (Fig. 6). The elaborate enhancer structure developed in the mammary lineage permits an exceptional expression of five genes accounting for 80% of milk proteins, an essential requirement for the sustained success of mammals. We propose that the concentration of enhancers and their high-density occupation with TF and co-activators provides an optimal regulatory environment. The co-existence and interdependence of a SE and gene-specific enhancers provides opportunities for the Casein locus to rapidly develop and produce milks with vastly different properties.

Figure 6. Proposed model outlining regulation of the shared mammary-salivary by a dual specific super-enhancer and gene-specific enhancers.

Figure 6

Super-enhancer preferentially activates the Csn1s2b, Csn3 and Odam genes and marginally the Csn1s1, Csn2and Csn1s2a genes during pregnancy in mammary gland and regulates the promoters of Odam, Fdcsp and Csn3 genes in salivary gland.

Materials And Methods

Mice

All animals were housed and handled according to the Guide for the Care and Use of Laboratory Animals (8th edition) and all animal experiments were approved by the Animal Care and Use Committee (ACUC) of National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK, MD) and performed under the NIDDK animal protocol K089-LGP-17. CRISPR/Cas9 targeted mice were generated using C57BL/6N mice (Charles River) by the transgenic core of the National Heart, Lung, and Blood Institute (NHLBI). Single-guide RNAs (sgRNA) were obtained from either OriGene (Rockville, MD) or Thermo Fisher Scientific (Supplementary Table 7). Target-specific sgRNAs and in vitro transcribed Cas9 mRNA were co-microinjected into the cytoplasm of fertilized eggs for founder mouse production. The ΔE1/2 mutant mouse was generated by injecting a sgRNA for E2 into zygotes collected from DE1 mutant mice. All mice were genotyped by PCR amplification and Sanger sequencing (Macrogen and Quintara Biosciences) with genomic DNA from mouse tails (Supplementary Table 8).

Chromatin immunoprecipitation sequencing (ChIP-seq) and data analysis

Mammary tissues from specific stages during pregnancy and lactation were harvested, and stored at −80°C. The frozen-stored tissues were ground into powder in liquid nitrogen. Chromatin was fixed with formaldehyde (1% final concentration) for 15 min at room temperature, and then quenched with glycine (0.125 M final concentration). Samples were processed as previously described22. The following antibodies were used for ChIP-seq: STAT5A (Santa Cruz Biotechnology, sc-1081 and sc-271542), GR (Thermo Fisher Scientific, PA1–511A), MED1 (Bethyl Laboratory, A300–793A), H3K27ac (Abcam, ab4729), RNA polymerase II (Abcam, ab5408), H3K4me1 (Active Motif, 39297) and H3K4me3 (Millipore, 07–473). Libraries for next-generation sequencing were prepared and sequenced with a HiSeq 2500 or 3000 instrument (Illumina). Quality filtering and alignment of the raw reads was done using Trimmomatic23 (version 0.36) and Bowtie24 (version 1.1.2), with the parameter ‘-m 1’ to keep only uniquely mapped reads, using the reference genome mm10. Picard tools (Broad Institute. Picard, http://broadinstitute.github.io/picard/. 2016) was used to remove duplicates and subsequently, Homer25 (version 4.8.2) and deepTools26 (version 3.1.3) software was applied to generate bedGraph files, seperately. Integrative Genomics Viewer27 (version 2.3.81) was used for visualization. Coverage plots were generated using Homer25 software with the bedGraph from deepTools as input. R and the packages dplyr (https://CRAN.R-project.org/package=dplyr) and ggplot228 were used for visualization. Each ChIP-seq experiment was conducted for two replicates. Sequence read numbers were calculated using Samtools29 software with sorted bam files. The correlation between the ChIP-seq replicates was computed using deepTools using Spearman correlation.

Total RNA sequencing (Total RNA-seq) and data analysis

Total RNA was extracted from frozen mammary tissue from wild-type mice at day six of pregnancy and purified with RNeasy Plus Mini Kit (Qiagen, 74134). Ribosomal RNA was removed from 1 μg of total RNAs and cDNA was synthesized using SuperScript III (Invitrogen). Libraries for sequencing were prepared according to the manufacturer’s instructions with TruSeq Stranded Total RNA Library Prep Kit with Ribo-Zero Gold (Illumina, RS-122–2301) and paired-end sequencing was done with a HiSeq 2500 instrument (Illumina).

Total RNA-seq read quality control was done using Trimmomatic23 (version 0.36) and STAR RNA-seq30 (version STAR 2.5.3a) using paired-end mode was used to align the reads (mm10). HTSeq31 was to retrieve the raw counts and subsequently, R (https://www.R-project.org/), Bioconductor32 and DESeq228 were used. Additionally, the RUVSeq33 package was applied to remove confounding factors. The data were pre-filtered keeping only those genes, which have at least ten reads in total. Genes were categorized as significantly differentially expressed with an adjusted p-value below 0.05 and a fold change > 2 for up-regulated genes and a fold change of < −2 for down-regulated ones. The visualization was done using dplyr (https://CRAN.R-project.org/package=dplyr) and ggplot234.

RNA isolation and quantitative real-time PCR (qRT–PCR)

Total RNA was extracted from frozen mammary tissue of wild type and mutant mice using a homogenizer and the PureLink RNA Mini kit according to the manufacturer’s instructions (Thermo Fisher Scientific). Total RNA (1 μg) was reverse transcribed for 50 min at 50°C using 50 μM oligo dT and 2 μl of SuperScript III (Thermo Fisher Scientific) in a 20 μl reaction. Quantitative real-time PCR (qRT-PCR) was performed using TaqMan probes (Csn1s1, Mm01160593_m1; Csn2, Mm04207885_m1; Csn1s2a, Mm00839343_m1; Csn1s2b, Mm00839674_m1; Odam, Mm02581573_m1; Csn3, Mm02581554_m1; mouse Gapdh, Mm99999915_g1, Thermo Fisher Scientific) on the CFX384 Real-Time PCR Detection System (Bio-Rad) according to the manufacturer’s instructions. PCR conditions were 95°C for 30s, 95°C for 15s, and 60°C for 30s for 40 cycles. All reactions were done in triplicate and normalized to the housekeeping gene Gapdh. Relative differences in PCR results were calculated using the comparative cycle threshold (CT) method and normalized to Gapdh levels.

Identification of regulatory elements

MACS235 peak finding algorithm was used to identify regions of ChIP-seq enrichment over the background to get regulatory elements at L1 and L10. Peak calling was done on both STAT5A replicates and broad peak calling on H3K27ac. Only those peaks were used, which were identified in both replicates and with H3K27ac coverage underneath.

Identification of complex mammary loci

Mammary specific genes were identified using RNA-seq data from pregnancy day six (p6), lactation day one (L1) and ten (L10). Those genes were considered, which were induced more than two-fold with an adjusted p-value below 0.05 between p6 and L1 or p6 and L10. The next step comprised the stitching of neighboring genes, by only considering protein coding genes. Those stitched loci were subsequently compared to the contact domains (Hi-C data) and only those loci passed the validation that were not overlapping with any border of the contact domains. If a locus overlapped they were treated the following way: (i) if the locus contains only two genes it was discarded, as it will not pass the prerequisite that a complex locus comprises at least two genes; (ii) all other loci were split up at the border and only those were kept that comprised more than two genes; (iii) the loci were shrunk to the size of the remaining genes. As possible regulatory elements (in our analysis STAT5A) are also part of complex loci, we expanded the borders of each locus to comprise STAT5A binding sites, if they were located within the adjacent intergenic region. Those new loci were finally checked again to not overlap with contact domain boundaries, otherwise they were shrunk down to the last element not overlapping with the contact domain; (iv) the final list of complex loci comprises loci with at least three genes.

The analysis was done using bedtools36, bedops, R (https://www.R-project.org/) and Bioconductor32 as well as the R packages dyplr (https://CRAN.R-project.org/package=dplyr) and ggplot237.

Statistical analyses

All samples that were used for qRT–PCR and RNA-seq were randomly selected, and blinding was not applied. For comparison of samples, data were presented as standard deviation in each group and were evaluated with a t-test and 2-way ANOVA multiple comparisons using PRISM GraphPad. Statistical significance was obtained by comparing the measures from wild-type or control group, and each mutant group. A value of *P < 0.05, **P < 0.001, ***P < 0.0001, ****P < 0.00001 was considered statistically significant. ns, no significant.

Acknowledgments

We thank Ilhan Akan, Sijung Yun and Harold Smith from the NIDDK genomics core for NGS. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). This work was supported by the Intramural Research Programs (IRPs) of National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and National Heart, Lung, and Blood Institute (NHLBI).

Footnotes

Competing interests

The authors have no competing interests.

Supplementary Files

This is a list of supplementary files associated with this preprint. Click to download.

Contributor Information

Lothar Hennighausen, National Institute of Diabetes and Digestive and Kidney Diseases.

Hye Kyung Lee, National Institute of Diabetes and Digestive and Kidney Diseases.

Michaela Willi, Laboratory of Genetics and Physiology, NIDDK, NIH, Bethesda.

Chengyu Liu, National Institutes of Health.

Data availability

All data were obtained or uploaded to Gene Expression Omnibus (GEO). ChIP-seq data of wild-type tissue at L1 and L10 were obtained under GSE74826, GSE119657 and GSE115370. RNA-seq data for WT at L1 as well as L10 were downloaded from GSE115370. The RNA-seq data for WT and mutant mice at p6 and p18, Hi-C and 4C-seq data for WT and mutant mice at L1 and ChIP-seq data for WT and mutant mice were uploaded to GSE127144 (ChIP-seq in GSE127139, RNA-seq in GSE127140). Reviewer link will be shared upon request.

References

  • 1.Braasch I. et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat Genet 48, 427–37 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kawasaki K., Buchanan A.V. & Weiss K.M. Biomineralization in humans: making the hard choices in life. Annu Rev Genet 43, 119–42 (2009). [DOI] [PubMed] [Google Scholar]
  • 3.Kawasaki K. The SCPP gene family and the complexity of hard tissues in vertebrates. Cells Tissues Organs 194, 108–12 (2011). [DOI] [PubMed] [Google Scholar]
  • 4.Rijnkels M., Elnitski L., Miller W. & Rosen J.M. Multispecies comparative analysis of a mammalian-specific genomic domain encoding secretory proteins. Genomics 82, 417–32 (2003). [DOI] [PubMed] [Google Scholar]
  • 5.Shin H.Y. et al. Hierarchy within the mammary STAT5-driven Wap super-enhancer. Nat Genet 48, 904–911 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Liu X. et al. Stat5a is mandatory for adult mammary gland development and lactogenesis. Genes Dev 11, 179–86 (1997). [DOI] [PubMed] [Google Scholar]
  • 7.Rada-Iglesias A. Is H3K4me1 at enhancers correlative or causative? Nat Genet 50, 4–5 (2018). [DOI] [PubMed] [Google Scholar]
  • 8.Local A. et al. Identification of H3K4me1-associated proteins at mammalian enhancers. Nat Genet 50, 73–82 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lee H.K., Willi M., Kuhns T., Liu C. & Hennighausen L. Redundant and non-redundant cytokine-activated enhancers control Csn1s2b expression in the lactating mouse mammary gland. Nat Commun 12, 2239 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Oudelaar A.M. & Higgs D.R. The relationship between genome structure and function. Nat Rev Genet 22, 154–168 (2021). [DOI] [PubMed] [Google Scholar]
  • 11.Furlong E.E.M. & Levine M. Developmental enhancers and chromosome topology. Science 361, 1341–1345 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hay D. et al. Genetic dissection of the alpha-globin super-enhancer in vivo. Nat Genet 48, 895–903 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Santosh A.B. & Jones T.J. The epithelial-mesenchymal interactions: insights into physiological and pathological aspects of oral tissues. Oncol Rev 8, 239 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Macias H. & Hinck L. Mammary gland development. Wiley Interdiscip Rev Dev Biol 1, 533–57 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jimenez-Rojo L., Granchi Z., Graf D. & Mitsiadis T.A. Stem Cell Fate Determination during Development and Regeneration of Ectodermal Organs. Front Physiol 3, 107 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Oudelaar A.M., Beagrie R.A., Kassouf M.T. & Higgs D.R. The mouse alpha-globin cluster: a paradigm for studying genome regulation and organization. Curr Opin Genet Dev 67, 18–24 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Gluck C. et al. A Global Vista of the Epigenomic State of the Mouse Submandibular Gland. J Dent Res 100, 1492–1500 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Cui Y. et al. Inactivation of Stat5 in mouse mammary epithelium during pregnancy reveals distinct functions in cell proliferation, survival, and differentiation. Mol Cell Biol 24, 8037–47 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhou J. et al. Elf5 is essential for early embryogenesis and mammary gland development during pregnancy and lactation. Embo j 24, 635–44 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Song E.A.C. et al. Genetic Study of Elf5 and Ehf in the Mouse Salivary Gland. J Dent Res, 220345221130258 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Galouzis C.C. & Furlong E.E.M. Regulating specificity in enhancer-promoter communication. Curr Opin Cell Biol 75, 102065 (2022). [DOI] [PubMed] [Google Scholar]
  • 22.Metser G. et al. An autoregulatory enhancer controls mammary-specific STAT5 functions. Nucleic Acids Res 44, 1052–63 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Bolger A.M., Lohse M. & Usadel B. Trimmomatic: a exible trimmer for Illumina sequence data. Bioinformatics 30, 2114–20 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Langmead B., Trapnell C., Pop M. & Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Heinz S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576–89 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ramirez F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44, W160–5 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Thorvaldsdottir H., Robinson J.T. & Mesirov J.P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 14, 178–92 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Love M.I., Huber W. & Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Masella A.P. et al. BAMQL: a query language for extracting reads from BAM files. BMC Bioinformatics 17, 305 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Dobin A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Anders S., Pyl P.T. & Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–9 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Huber W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12, 115–21 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Risso D., Ngai J., Speed T.P. & Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32, 896–902 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wickham H. Ggplot2 : elegant graphics for data analysis, viii, 212 p. (Springer, New York, 2009). [Google Scholar]
  • 35.Zhang Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Quinlan A.R. & Hall I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–2 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wickham H. Ggplot2 : Elegant Graphics for Data Analysis., (Springer, 2009). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data were obtained or uploaded to Gene Expression Omnibus (GEO). ChIP-seq data of wild-type tissue at L1 and L10 were obtained under GSE74826, GSE119657 and GSE115370. RNA-seq data for WT at L1 as well as L10 were downloaded from GSE115370. The RNA-seq data for WT and mutant mice at p6 and p18, Hi-C and 4C-seq data for WT and mutant mice at L1 and ChIP-seq data for WT and mutant mice were uploaded to GSE127144 (ChIP-seq in GSE127139, RNA-seq in GSE127140). Reviewer link will be shared upon request.


Articles from Research Square are provided here courtesy of American Journal Experts

RESOURCES