Abstract
The kidney medulla is a specialized region with important homeostatic functions. It has been implicated in genetic and developmental disorders along with ischemic and drug-induced injuries. Despite its role in kidney function and disease, the medulla’s baseline gene expression and epigenomic signatures have not been well described in the adult human kidney. Here we generated and analyzed gene expression (RNA-seq), chromatin accessibility (ATAC-seq), chromatin conformation (Hi-C) and spatial transcriptomic data from the adult human kidney cortex and medulla. Tissue samples were obtained from macroscopically dissected cortex and medulla of tumor-adjacent normal material in nephrectomy specimens from five male patients. We used these carefully annotated specimens to reassign incorrectly labeled samples in the larger public Genotype-Tissue Expression (GTEx) Project, and to extract meaningful medullary gene expression signatures. Using integrated analysis of gene expression, chromatin accessibility and conformation profiles, we found insights into medulla development and function and then validated this by spatial transcriptomics and immunohistochemistry. Thus, our datasets provide a valuable resource for functional annotation of variants from genome-wide association studies and are freely accessible through an epigenome browser portal.
Keywords: Kidney medulla, Gene expression, Transcriptional regulation, Kidney disease, Epigenetics, Kidney development, Spatial transcriptomics
Graphical Abstract
Translational Statement
The renal medulla is a specialized region of the kidney with important homeostatic functions, which has been implicated in genetic and developmental disorders, as well as ischemic and drug-induced injuries. Our study identifies medulla-specific genes and transcription factors shedding light on regulatory programs in this region which so far has not been well characterized in the adult human kidney. We present the first reference-quality functional genomic maps of the kidney medulla encompassing gene expression (RNA-seq), chromatin accessibility (ATAC-seq) and chromatin conformation (Hi-C). This will enable in depth, genome-wide, multi-omic analyses of any gene locus relevant to kidney biology and disease.
Introduction
The kidney medulla has important homeostatic functions as it contains the portions of the nephron that fine tune salt and water balance in the circulation. It is also a target of genetic kidney diseases such as autosomal-dominant tubulointerstitial kidney disease (ADTKD-UMOD) and certain congenital abnormalities of the renal and urinary tract. The kidney medulla has a specialized hypoxic and high salt milieu1,2 and represents a specialized immune environment that resists infection3. The medulla is particularly sensitive to ischemic injury and certain nephrotoxins. Despite these important roles, the expression patterns and regulation of genes important to human kidney medulla function are relatively understudied. The large Genotype-Tissue Expression (GTEx) project4, which is widely used for the study of tissue-specific gene expression, contains 85 samples of kidney cortex, but only 4 samples of kidney medulla. Another widely used resource, the Human Protein Atlas5, does not contain any expression data for human kidney medulla. Furthermore, epigenomic data such as chromatin accessibility and conformation are not available for the kidney medulla. Generation of such data sets from primary tissue samples is important since cultured cells alter their epigenomic features and can show an injured phenotype 6.
Together, these deficits have hampered our understanding of the medulla’s role in numerous kidney diseases, including developmental defects but also acute kidney injury. To address this knowledge gap, here we generate human kidney cortex- and medulla-specific gene expression (RNA-seq), chromatin accessibility (ATAC-seq) and genome organization (Hi-C) datasets. Through integrative and comparative analyses between the cortex and medulla, we provide new insights into medulla-specific gene expression and regulation. We demonstrate an approach of using a small set of carefully curated human tissue samples to detect and correct for tissue-misassignment in publicly available gene expression data, which has implications for the use of GTEx tissue samples beyond the kidney. We also validate our findings using spatial transcriptomics and immunohistochemistry. The datasets resulting from our study are freely available through an epigenome browser portal and represent a resource for experimental researchers working on medullary genes as well as the GWAS community for functional annotation of variants.
Methods
Tissue dissection and RNA sequencing
Kidney samples were obtained from macroscopically dissected cortex and medulla of tumor-adjacent normal tissue in nephrectomy specimens from three donors as described previously7. Tissues from an additional two donors were processed separated for Hi-C (see below). All samples were procured in deidentified fashion and with informed consent. The study was approved by the University of Washington’s Institutional Review Board. RNA extraction and RNA-seq was performed by GeneWiz. Trimming and alignment of paired-end fastq files to human reference genome sequence hg38 was done with STAR 2.7.5b8 with parameters --outFilterIntronMotifs RemoveNoncanonical --outFilterMismatchNoverReadLmax 0.04. Counting of the number of reads aligned to each exon (feature) was performed using featureCounts 9. Donor characteristics, clinical parameters and sequencing information is available in Supplementary Table 1.
Cellular deconvolution
In order to estimate the cell-type compositions in the cortex and medulla samples, we applied the tool BisqueRNA10 to deconvolute our bulk RNA-seq data using reference snRNA-seq data from the Kidney Precision Medicine Project (KPMP). The snRNA-seq data were retrieved from the KPMP Kidney Tissue Atlas at https://atlas.kpmp.org/explorer/dataviz11 and processed using the Seurat package (v4.1.0)12. Genes were clustered based on average expression (Euclidean distance). Within the R package BisqueRNA (version 1.0.5), we used the function ReferenceBasedDecomposition with default parameters for deconvolution.
Differential gene expression analysis of own samples
Differentially expressed genes were computed with DESeq213(p2) with default parameters. Only transcripts with a count of at least 5 in at least two samples were selected for the analysis. Shrunken log2-fold changes were generated with the lfcShrink function (“apeglm” method) to make log2-fold changes comparable across a wide range of counts. All genes with a log2-fold change > 1 and an adjusted p-value < 0.01 were considered differentially expressed. Absolute count differences for each transcript were derived from the respective base mean and log2-fold changes. GO term over-representation analyses were performed using the enrichGO function from clusterProfiler (3.18.1)14 with all transcripts included in the DESeq analysis as background. GO term enrichment analyses for the GTEx data were carried out identically.
Reassignment and differential gene expression analysis of GTEx V8 samples
The gene read counts data (GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_reads.gct.gz) and sample attributes (GTEx_Analysis_v8_Annotations_SampleAttributesDD.xlsx, including pathology notes) were retrieved from https://www.gtexportal.org/home/datasets, and the RNA-seq data from kidney samples was extracted by filtering for SMTS == “Kidney” & SMAFRZE == “RNASEQ”. Transcripts with a count of at least 5 in at least 2 samples were selected, then the counts were variance-stabilized and normalized for library size using the DESeq2 vst function. The principal component analysis (PCA) was computed using the DESeq2 plotPCA function with default parameters. For the combined PCA with our own samples, we first selected transcripts with a raw count of at least 5 in at least 2 of our own samples and then performed an inner join with the GTEx data based on Ensembl IDs in order to select transcripts which were detected in both datasets. The combined count matrix was variance-stabilized and normalized for library size using the DESeq2 vst function. The combined PCA was computed with the DESeq2 plotPCA function using the top 1000 most variable features. GTEx samples were reassigned based upon proximity to our own medulla and cortex reference samples in the combined PCA. Specifically, all GTEx samples that 1) mapped to the left of our own right-most sample scored as 100% medulla by a trained pathologist on PC1, and 2) were located on the same side of 0 as all of our own medullary samples on PC2 were assigned as “medulla”. The remaining GTEx samples were assigned “cortex”. The differential gene expression analysis with GTEx samples before and after reassignment was carried out identically to that done with our samples.
ATAC sequencing and bioinformatics workflow
ATAC-seq was carried out on snap-frozen human kidney samples by ActiveMotif (Carlsbad, CA) and aligned to the hg38 reference genome (BWA default settings). Additional sequencing information for the individual samples can be found in Supplementary Table 1. Only properly aligned read pairs with a mapping quality >= 40 aligning to the same chromosome were selected with samtools 1.1315. Read duplicates were removed with picard 2.25.5, and reads located within the ENCODE blacklist regions were removed with deepTools alignmentsieve 3.5.116. Accessible regions (ARs, peaks) for each sample were called with MACS 2.1.017 at a cutoff q-value of 0.01 without control file and the –nomodel, -shift −100 and –extsize 200 options. A masterlist of ARs across all samples was created using bedops –m (bedops 2.4.35)18. Then normalization of read depth was carried out by random down-sampling to the sample with lowest coverage. The counts for all master list ARs for all samples were determined with bedops bedmap –count. Differentially accessible regions (DARs) and log2-fold changes (l2fc) in counts for accessible regions between cortex and medulla were identified with DESeq2 (default settings). All regions with an adjusted p-value < 0.05 and a l2fc > 0 were considered to be differentially accessible. PCA plots with all samples were generated from the Rlog transformed count data for each AR in the masterlist with the DESeq2 plotPCA function. ARs and DARs were assigned to genes and the respective genomic features based on proximity with the ChIPseeker 1.28.3 annotatePeak function19. Promoters were defined as the region −5000 to 1000 bp from the transcription start site (TSS), downstream was defined as the region less than 5000 bp downstream of the gene body. The genomicAnnotationPriority parameter was set to the following order: “Promoter”, “Exon”, “Intron”, “5UTR”, “3UTR”, “Downstream”, “Intergenic”. The ChIPseeker based assignment of ARs and DARs to genes was used to test for enrichment of DARs within DEG promoters (Fisher’s exact test, fisher.test function from R 4.1.0 stats package)19. The Pearson correlation coefficients for log2-fold changes in AR counts and gene expression were computed with the cor.test function from the R stats package.
Identification of DEGs with DAR enrichment
Genes with enrichment of DARs (FDR < 0.05) were identified with GREAT 4.0.420 (Species assembly: hg38; Association rule: Basal+extension: 5000 bp upstream, 1000 bp downstream, 0 bp max extension, curated regulatory domains included) using either the cortical or medullary DARs as test region and the non-differential ARs as background. DEGs with DAR enrichment for cortex and medulla were the identified by overlapping the respective lists of DEGs and genes with DAR enrichment for cortex and medulla. The subset of DARs with DEG enrichment in proximity of differential Hi-C loops was identified by extending the Hi-C loop anchors by 25 kb in both directions and overlapping them with the respective gene bodies (overlapsAny function, GenomicRanges version 1.44.021).
Regional plots for selected genes
Read-normalized density tracks (.bw) from the RNA-seq and ATAC-seq experiments, published snATAC-seq (.bw, see below), DAR regions (.bed), differential loops (.bedpe, interact), GWAS variants (.bed, see below) and Hi-C contact matrices (.hic, see below) were visualized in a stable University of California, Santa Cruz (UCSC) browser instance available through: http://faculty.washington.edu/shreeram/genomebrowser.html.
Single nucleus ATAC-seq (snATAC-seq) data
Previously published snATAC-seq data from tumor-adjacent normal tissue in nephrectomy specimens (5 donors) was kindly provided by the authors of the publication22. The provided filtered peak barcode matrices, metadata and fragment files were used to create a Seurat object with chromatin assay using Seurat v4.2.1 and its companion package Signac v1.8.0. Cells were filtered and assigned to cell types identically to the original publication yielding a number of 27,034 cells assigned to 15 different cell types. Cells belonging to proximal straight tubule (PST) and proximal convoluted tubule (PCT) were combined to a single cell type. Then bigWig coverage plots were exported for all cell types (20bp binning, signal normalized by cells per cell type).
GWAS variants
GWAS variants for eGFR23 and UACR24 were filtered for p-values below 5e-8 and then exported in UCSC compatible .bed custom track format.
ENCODE ChIP-seq data
For epigenome browser visualizations, chromatin immunoprecipitation and sequencing (ChIP-seq) data for CTCF (ENCFF011TYF), H3K27Ac (ENCFF829DEJ) and H3K4Me1 (ENCFF408FAL), H3K4Me3 (ENCFF493NDW), H3K9Me3 (ENCFF477UKT) were downloaded as normalized density (.bw) files from the ENCODE project website (www.encodeproject.org).
Chromosome conformation (Hi-C) data generation
Hi-C was performed on macro-dissected kidney cortex and medulla from two donors (donor 4, male, 74 years and donor 5, male, 67 years – see Supplemental Table 1 for additional clinical information). Approximately 0.5 cm3 of tissue was minced with a razor blade and fixed and nutated for 20 minutes in 10 ml of a 1:10 dilution of 10% neutral buffered formalin (Fisher). The formalin was then quenched by direct addition of 0.1 g of glycine (Sigma) and nutation for another 15 minutes. The tissue fragments were submitted to Phase Genomics (Seattle, WA) for Hi-C library preparation using their Human Hi-C Kit and sequencing on an Illumina HiSeq 4000. The sample from donor 4 was processed using restriction enzyme Sau3AI and the sample from donor 5 was processed using a mixture consisting of DpnII, DdeI, HinfI, and MseI per Phase Genomics’ updated protocol at the time of procurement. Additional sequencing information for the individual samples can be found in Supplementary Table 1. Following alignment and downsampling to equal read numbers, and Knight-Ruiz (KR) normalization, contact matrices (.hic files) were generated and visualized using the Juicebox application (https://www.aidenlab.org/juicebox/). Hi-C sequencing parameters are included in Supplementary Table 1.
Hi-C loop calling and visualization, A/B compartment specification and ChIP-seq enrichment analysis
Intrachromosomal interactions (loops) in cortex and medulla were inferred using Mustache25 from the .hic files. We used a p-value threshold of 0.1 with a sparsity threshold of 0.7 for detecting intrachromosomal interactions (differential loops), and an FDR threshold of 0.2 to identify differential intrachromosomal interactions between cortex and medulla. Differential loops were visualized using KR-normalized contact matrices (.cool files) using pyGenomeTracks version 3.7.126. A/B compartments were inferred as the first principal component of the Pearson correlation matrices of each chromosome, generated from the Hi-C contact matrices, using the hicPCA tool from the HiC Explorer package27. The A/B compartments were plotted using the hicPlotMatrix function of the same package. The total numbers of DARs and DEGs in the genomic windows of loop anchors + 25 kb flanking regions were also calculated. The results were normalized to density of DARs and DEGs per Mb genomic region. The density of DARs and DEGs adjacent to differential loops were compared against the average density of DARs and DEGs in the total human transcriptome. Hi-C enrichment analysis used the ChIP-seq peaks for CTCF and chromatin modifications specified above. The proportion of loop anchors overlapping CTCF, H3K9Me3, and H3K27Ac were calculated by counting the number of loop anchors +25 kb flanking regions with at least one overlapping ChIP-seq peak of interest.
Analysis of cis-coaccessibility networks (CCANs) and overlap with topologically associated domains (TADs)
CCANs were predicted with Cicero v1.3.928using the snATAC-seq library and respective cell types described above (TAL and PCT-PST combined)22. Cell datasets (CDS) were created for each cell type with “make_cicero_cds” and Cicero connections were predicted with “run_cicero” using the default settings (500,000 bp window). CCANs were created with “generate_ccans” using only connections with a coaccessibillity score above of 0.2. Hierarchical Topologically Associated Domains (TADs) were inferred using onTAD29, with default parameters from KR normalized Hi-C contact matrices of individual chromosomes. For all CCANs the maximum percentage overlap with any TAD was computed. CCANs from TAL were overlapped with TADs from medulla (healthy donor 4 only) and CCANs from PCT-PST with cortex (donor 4) respectively. This was performed for the TAD hierarchy levels 1–3 separately.
TF motif enrichment in DARs
Transcription factors (TFs) with motif enrichment in cortical and medullary DARs were identified with HOMER v4.11 findMotifsGenome (options: –size given –mask)30. Known TFs with a q-value below 0.001 were considered significant. For motif enrichment in cortical DARs, the medullary DARs were used as background and vice versa. Both sets of regions had nearly identical size distributions based on mean, median and histogram, hence the option “–size given” could be chosen for findMotifsGenome.
TF motif enrichment in accessible promoter regions
The TSS for all DEGs based on their canonical transcript was retrieved from Ensembl (genomic build GRCh38.p13) via biomaRt 2.48.331. Promoters were defined as the region −5000 bp to +1000 bp from the TSS20. The merged set of ARs identified in the three cortical and the three medullary samples was screened for any ARs overlapping the promoters of cortical or medullary DEGs. HOMER v4.11 findMotifsGenome (options: –size given –mask) was used to identify TFs with motif enrichment in ARs of medullary and cortical DEG promoters. For cortex, the ARs in medullary DEG promoters were used as background, and vice versa. TFs with a q-value below 0.001 were considered significant.
TFs with differential motif accessibility
TFs with differential motif accessibility were identified with chromVar 1.14.032. The analysis was based on the merged set of ARs identified in the three cortical and the three medullary samples. All ARs were resized to an equal width of 500 bp around the AR center. Read depth for all samples was randomly down-sampled to match the sample with lowest coverage. chromVar getCounts was used to obtain the fragment counts per accessible region. Transcription factor motifs were retrieved with the JASPAR2020 R-package33 and limited to the JASPAR CORE vertebrates motif set. ARs containing the respective TF motifs were annotated with motifmatchR 1.14.0 and then used for computing the accessibility deviation between cortex and medulla for each TF with chromVar. TFs with a deviation p-value < 0.05 were considered significant for either cortex or medulla, depending on if the mean deviation was higher for either for cortex or medulla.
Combined approach to identify medulla and cortex specific TFs
All TFs from the JASPAR CORE Vertebrates database were marked for expression (count > 5 in >= 2 samples), differential expression (p-adj 0.01, l2fc >= 1), significant motif enrichment in DARs (q-value < 0.001) and differential accessibility (p-val < 0.05) in either cortex or medulla. The number of TFs meeting different subsets of criteria were computed and depicted with ComplexUpset 1.3.334.
Identification of putative POU3F3 downstream target genes
All medullary DEGs were screened for POU3F3 motif occurrences within ARs located in their promoters with motifmatchR 1.14.0 using the TF motif MA0788.1 from the JASPAR 2020 CORE Vertebrates database (p.cutoff = 5e-5). The same screen was additionally performed with HOMER v4.11 annotatePeaks using the Brn1 (POU3F3) motif provided by HOMER. The putative POU3F3 downstream targets from both screens were then filtered for protein coding genes. Expression in selected kidney cell types is shown for genes identified in both motif searching approaches and for genes expressed in at least 10% of the cells in any of the cell types. The snRNA-seq data was retrieved from the KPMP Kidney Tissue Atlas at https://atlas.kpmp.org/explorer/dataviz11. Genes were clustered based on average expression (Euclidean distance).
POU3F3 immunohistochemistry
5 µm thick sections of formalin fixed paraffin embedded kidney cortex and medulla were affixed to charged microscope slides and baked at 60°C for 30 minutes. After deparaffinization and rehydration, sections were subjected to heat induced antigen retrieval using citrate buffer, pH 6 for 10 minutes at 110°C in a BioCare Medical decloaking chamber. After blocking, polyclonal rabbit primary antibodies reactive against POU3F3 (Atlas Antibodies, HPA067151) were applied to the tissue sections at 0.5 µg/ml overnight at 4°C. After washing, secondary antibody labeling and chromogenic development were performed using an ImmPRESS horseradish peroxidase Horse Anti-Rabbit IgG PLUS Polymer Kit (Vector Biolabs, MP-7800-15). After development, slides were coverslipped with EcoMount (BioCare Medical, EM897L).
Spatial transcriptomics
Digital spatial profiling (DSP) experiments were performed at the University of Washington’s DSP Core Facility. 5 µm sections of formalin-fixed paraffin embedded portions of kidney cortex and medulla from donors 1–3 were affixed to SuperFrost Plus charged slides. Following deparaffinization and rehydration, the sections were subjected to antigen retrieval per NanoString’s protocol for RNA detection (heat induced antigen retrieval in pH 9 Tris-EDTA for 20 minutes followed by 1 µg/ml proteinase K in PBS digestion for 15 minutes). After antigen retrieval, the sections were hybridized with human whole transcriptome atlas probes (Human NGS Whole Transcriptome Atlas RNA_1.0) overnight at 37°C. Following stringent washes, the slides were blocked and counterstained with fluorescently labeled antibodies targeting CD10 and pan-cytokeratin and nuclei were labeled using SYTO13. The slides were then loaded into a GeoMx digital spatial profiler (NanoString, Seattle, WA). After fluorescent slide scanning, a trained renal pathologist (S.A.) selected 2 regions of interest (ROIs) based on histology for each donor: glomeruli, proximal tubules (CD10+), distal tubules (pan-cytokeratin+), as well as anatomically distinct regions such as medullary rays and the outer and inner stripes of the outer medulla. Following capture of bound probes by the instrument, an Illumina next generation sequencing library was generated using SeqCode reagents obtained from NanoString. The library was sequenced on an Illumina NextSeq 2000 SP flowcell. The resulting raw reads were processed to FASTQ using Illumina BaseSpace and converted to digital count files (.dcc) using Nanostring GeoMx NGS Pipeline Version 2.3.3.10. The resulting .dcc files were uploaded into the GeoMx DSP instrument. Data processing was performed using the following parameters: raw reads threshold (1000), aligned reads threshold (80), stitched read threshold (80), sequencing saturation threshold (50%), negative probe geometric mean (2.3), no template count threshold (1000), surface area threshold (>10,000 µm2), nuclei count threshold (>50). The data were 3rd quartile (Q3) normalized and the resulting counts were input into R and used for generating dotplots as a function of ROI histology (ggplot2), unbiased clustering (pheatmap) and cellular deconvolution (SpatialDecon35) using the KPMP normal reference scRNA-seq dataset.
Data availability
RNA-seq, ATAC-seq and Hi-C data can be accessed under Gene Expression Omnibus (GEO) accession number GSE212910 (reviewer token: kfyfaokwdnwdpmj). A UCSC track hub of the functional genomic data generated in this study is located at https://akilesh-genome-browser.s3.us-west-2.amazonaws.com/hub.txt. These data can also be visualized via an epigenome browser portal located at: http://faculty.washington.edu/shreeram/genomebrowser.html. Digital spatial profiling data are available at GEO accession number GSE235841 (reviewer token: kjqvckgapxkrnan).
Results
Differential gene expression between kidney cortex and medulla.
We performed RNA-seq and ATAC-seq on kidney tissue from three male human kidney donors, which was macrodissected into cortical and medullary tissue samples (Fig. 1). Histologic evaluation at the timepoint of sampling revealed good representation of the respective kidney regions (Fig. 2a, Supplementary Fig. 1). The medulla samples contained only outer medulla (outer and inner stripes), without significant representation of the inner medulla. All three patients (aged 57, 67 and 73) had preserved kidney function (eGFR >70 ml/min/1.73m2; Supplementary Table 1), and histologic review by a renal pathologist (S.A.) showed only mild arteriosclerosis and mild nephrosclerosis (global glomerulosclerosis, tubular atrophy and interstitial fibrosis). Consistent with the histology results, PCA based upon both gene expression (RNA-seq) and chromatin accessibility (ATAC-seq) data showed clear separation between cortical and medullary samples (Fig. 2b). Using snRNA-seq data from the KPMP Kidney Tissue Atlas11 as a reference, we deconvoluted the estimated abundance of different kidney cell types in our bulk RNA-seq data using BisqueRNA10 (Figure 2c; Supplementary Table 2). As expected, there was a greater proportion of proximal tubule cells in our cortical samples (42%±3% vs. 22%±5% in medulla, p-value 0.002). Conversely, cells of the thick ascending limb were the most abundant cell type in our medullary samples (37%± 4% vs. 17%± 2% in cortex, p-value 0.004). These histologic and computational assessments therefore validated the correct assignments of our cortical and medullary tissue samples for subsequent comparative studies.
Figure 1|. Overview of sample types and analysis workflow.
A flowchart depicting sample processing and analysis worldlows used in this article. Our browsable genome-wide annotation resource can be found here: http://faculty.washington.edu/shreeram/genomebrowser.htm. RNA-seq, RNA sequencing; TF, transcription factor.
Figure 2|. RNA sequencing (RNA-seq) and differentially expressed gene (DEG) analysis.
(a) Representative sections of macrodissected cortex and medulla tissue from donor 1 (periodic acid-Schiff staining). Bar = 100 μm. (b) Principal component analysis (PCA) of all 6 samples from donor 1 to 3 based on RNA-seq data (top, using counts per transcript) and ATAC-seq data (bottom, using counts per accessible region). PC1 explains the largest proportion of variance between samples in both assays (79% for RNA-seq and 65% for ATAC-seq). (c) Deconvolution analysis of bulk RNA-seq data with BisqueRNA showing the estimated proportions of different cell types in our tissue samples. Cortical cell types are shaded in red, medullary cell types in blue, and cell types without specified localization are shown in gray. Cell type abbreviations include podocytes (PODs), parietal epithelial cells (PECs), descending and ascending thin limb cells (DTLs and ATLS, respectively), proximal tubules (PTs), thick ascending limb (TAL), distal convoluted tubule cells (DCTs), connecting tubule cells (CNTs), epithelial cells (ECs), fibroblasts (FIBs), Intercalated cells (ICs), parietal cells (PCs), immune cells (IMMs), neuronal cells (NEUs), and vascular smooth muscle cells/pericytes (VSM/Ps). (d) Volcano plot and number of DEGs from the differential gene expression analysis comparing cortex and medulla samples in our cortical and medullary samples (significance thresholds indicated with lines: log2 fold change > 1, adjusted P < 0.01). Fold changes were computed as medulla/cortex (i.e., positive log2 fold changes mean higher expression in medulla, and negative values mean higher expression in cortex). (e) Enriched Gene Ontology (GO) terms (biological processes) for medullary and cortical DEGs from (d) with adjusted P value for enrichment and number of genes belonging to the respective terms (count). To optimize viewing of this image, please see the online version of this article at www.kidney.international.org.
First, we performed a differential gene expression analysis between the cortical and medullary samples with DESeq213. Among 25,185 genes with at least minimal detectability, we found 2,372 (9.4%) to be differentially expressed between cortex and medulla (adjusted p-value < 0.01, log2-fold change > 1; Supplementary Table 3). There were 1,266 differentially expressed genes (DEGs) with higher expression in cortex and 1,106 DEGs with higher expression in medulla (Fig. 2d). Next, we ranked DEGs based on their expression differences both in terms of absolute counts as well as in terms of statistical significance (i.e., adjusted p-value) in order to identify genes with the strongest support for between-group differences. This revealed genes with highly differential expression in the medulla, including the canonical genes UMOD, SLC12A1 and AQP2 (Supplementary Fig. 2). In gene ontology over-representation analysis, cortical DEGs were enriched for transporter activity and metabolism-related biologic processes and medullary DEGs were enriched for terms related to angiogenesis, extracellular matrix and renal or urogenital system development (Fig. 2e, Supplementary Fig. 3a; Supplementary Tables 4 and 5).
Carefully annotated samples recalibrate and extend the utility of the larger GTEx dataset.
Next, we asked if we could correlate the expression differences detected in our bulk kidney RNA-seq data with those of the 85 cortical and 4 medullary samples available through the GTEx project. Surprisingly, performing DEG analysis on the GTEx cortex and medulla samples yielded only a very low number of 76 DEGs (Fig. 3a) without any significantly enriched GO-terms. To investigate this apparent discrepancy, we first performed a PCA based on the GTEx data. This showed a separation of the samples into two clusters along principal components (PC) 1 and 2, the smaller of which contained 9 samples. However, the clusters did not clearly correspond to the GTEx sample assignment as cortex or medulla (Fig. 3b). When we examined the published pathology descriptors of the GTEx kidney samples, we noted that several samples in the smaller medulla GTEx cluster described contamination from the renal cortex (Supplementary Fig. 4). These findings suggested a potential mis-assignment of cortex samples as medulla and vice versa.
Figure 3|. Reassignment of Genotype-Tissue Expression (GTEx) samples using curated reference data set.
(a) Volcano plot and number of differentially expressed genes (DEGs) from the differential gene expression analysis comparing cortex and medulla samples in the GTEx version 8 database (significance thresholds indicated with dotted lines: log2 fold change > 1, adjusted P < 0.01). Fold changes were computed as medulla/cortex (i.e., positive log2 fold changes mean higher expression in medulla, and negative values mean higher expression in cortex). (b) The principal component analysis (PCA) based on GTEx RNA-sequencing (RNA-seq) data. Color coding highlights the original assignment of samples to cortex and medulla. Note the mixed tissue types in both sample clusters. (c) Combined PCA of our own and the GTEx samples. Color coding highlights the corrected assignment to cortex and medulla for the GTEx samples. GTEx samples from the top right cluster in panel (b) are now located close to our medullary samples. The remaining GTEx samples lie closer to our cortical samples. (d) Volcano plot and number of DEGs after correction of sample assignment shown with the same axes as in panel (a). (e) Overlap of DEGs from our own and the corrected GTEx samples. A total of 402 medullary and 653 cortical DEGs were identified in both analyses. Overlapping DEGs show highly correlating log2 fold changes (Spearman p = 0.92, red and blue dots in scatterplot), and only 4 DEGs (<1%) show log2 fold changes of opposing directions (black dots). (f) Enriched Gene Ontology (GO) terms (biological processes) for overlapping DEGs (top: medulla; bottom: cortex) with adjusted P value for enrichment and number of genes belonging to the respective terms (count).
To correctly reassign the GTEx samples, we adopted a PCA-based approach widely used in genetic epidemiology for the assignment of genetic ancestry of samples of unknown ancestry, when a reference dataset is available36,37. Using our own curated cortex and medulla samples as the reference dataset, we performed a joint PCA analysis together with the GTEx samples, which was agnostic to the assignment of the tissue samples in GTEx. Interestingly, we observed that the exact same 9 GTEx samples detected as a smaller cluster of the GTEx-only PCA clustered together with our medullary samples (Fig. 3c), supporting potential misassignment of several medullary samples as renal cortex. Indeed, 6 of these 9 GTEx samples had pathology information available that described contamination (Supplementary Fig. 4). Therefore, we reassigned the GTEx samples to medulla and cortex based on the clustering observed in the joint PCA analysis (Fig. 3c), yielding 9 re-assigned GTEx medulla and 80 GTEx cortex samples for downstream analyses (Supplementary Table 6). With this reassignment, the number of GTEx DEGs increased sharply from 76 to 3,926 (Fig. 3d, Supplementary Table 7).
Next, we intersected the resulting GTEx DEGs with the 2,372 DEGs from our own samples. We identified 1,055 overlapping significant DEGs (653 with higher expression in the cortex, 402 with higher expression in the medulla; Supplementary Table 8). These overlapping DEGs exhibited highly correlated log2-fold changes across the two datasets (Spearman’s ρ = 0.92, Fig. 3e). Re-assignment of the GTEx samples was also supported by over-representation of similar gene ontology terms as were detected in our own reference samples (Fig. 3f, Supplementary Fig. 3b, Supplementary Tables 9, 10). Furthermore, the DEGs identified across both datasets were biologically plausible. For example, medullary DEGs with high average expression included genes with well-established roles in diseases originating in the medulla such as UMOD and SLC12A1. Conversely, cortical DEGs included members of the mitochondrial cytochrome c oxidase complex (MT-CO1, MT-CO3) and metabolic enzymes such as aldolase B (ALDOB), which is consistent with the high metabolic activity and energy demand of proximal tubule epithelial cells, the most abundant cell type in the renal cortex. Therefore, using only a small number of carefully annotated reference samples, we were able to reassign and extend the utility of a larger public dataset such as GTEx.
Integrative analysis of chromatin accessibility and gene expression data.
Next, we analyzed the chromatin accessibility (ATAC-seq) data that we had generated in parallel on these same samples. We first identified accessible chromatin regions (ARs) in all cortical and medullary samples and then merged these regions into a master list yielding a total of 196,278 ARs. For all samples, >75% of ARs were located >5kb from TSS, consistent with their role as distal regulatory elements and reproducing similar observations from our prior studies38–40 (Supplementary Fig. 5a). Of these ARs, 8,750 (4.5%) showed differential accessibility between cortex and medulla and were designated as differentially accessible regions (DARs). Of the DARs, 5,462 had higher accessibility in cortex, while 3,288 displayed higher accessibility in the medulla (Supplementary Fig. 5b).
Most ARs and DARs were found in promoter regions, exons, and introns (71–80%), with the remaining ARs located in downstream and intergenic regions (Supplementary Table 11). In both cortex and medulla, DEGs were much more likely than non-differentially expressed genes to have a DAR within their promoter. For example, 10% (130/1,266) of cortical and 7% (78/1106) of medullary DEGs had at least one DAR within their promoter compared to 2% (408/22,753) of all non-DEGs (Supplementary Fig. 5c). Furthermore, DARs in proximity to DEGs showed a strong positive correlation between changes in accessibility and the DEG’s expression (Supplementary Figs. 6, 7). The correlation was most prominent for DARs located in promoter regions (r=0.82) and directly downstream of the gene body (r=0.85). This correlation was less pronounced for non-differential ARs and genes (r from 0.09 to 0.25 depending on AR location, Supplementary Table 12). We then identified the subset of DEGs with an enrichment of DARs in a broader regulatory region around genes including the promoter, the gene body and additional up- and downstream regions. We intersected these DEGs with those genes also showing significant expression differences in the GTEx data. This identified 39 medullary and 32 cortical DEGs with DAR enrichment (Supplementary Table 13), which represents the gene set with the highest level of concordance across gene expression and chromatin accessibility in both our data and in GTEx.
Chromosome conformation data reveals the structural basis of gene regulation.
Accessible chromatin elements such as enhancers can make long-range (multi-kilobase) interactions with their target genes. To study such chromatin contacts, we generated high-resolution genome-wide chromatin conformation data (Hi-C) from two matched cortex-medulla pairs. We mapped >350 million contacts in each sample (Supplementary Table 1), the majority of which were short-range and intrachromosomal (Supplementary Figs. 8–12). The so-called A compartment is thought to be associated with large areas of open chromatin, whereas the B compartment is associated with closed chromatin regions. Eigenvector analysis of the normalized genome contact matrix revealed differences in A/B compartment distributions between cortex and medulla for some chromosomes (Fig. 4a). Longer-range chromatin interaction loops were identified using Mustache, which utilizes a computer vision strategy to identify blob-shaped objects in Hi-C contact matrices25. Across both donors, this identified 9,535 loops in the cortex and 9,407 loops in the medulla. Next, we used Mustache to identify intrachromosomal loops with significantly different contact density between the cortex and medulla. Across both donors, this identified 4,063 loops that showed stronger contact density in the cortex and 3,833 loops that with stronger contact density in the medulla (Supplementary Tables 14 and 15). Exemplar differential loops in renal cortex and medulla are shown in Figure 4b. Consistent with previous reports25,41, a significant proportion of the loop termination points in the cortex and medulla demonstrated overlap with CTCF binding domains and the active chromatin mark H3K27Ac (Supplementary Fig. 13). Consistent with a role for loop termination points in actively maintaining chromatin architecture, a lower percentage of the loop termination points in the cortex and medulla demonstrated overlap with H3K9Me3, a heterochromatin mark42 (Supplementary Fig. 13).
Figure 4 |. Hi-C data.
(a) Hi-C sequencing: A/B compartments in kidney cortex (left) and kidney medulla (right) from the same kidney. The top row demonstrates chromosome 1 with significant differences in the A/B compartments in between the cortex and the medulla. (b) Knight-Ruiz-normalized Hi-C contact matrices showing a region on chromosome 22 (left) with higher cortical and a region on chromosome 13 with higher medullary chromatin contact density. (c) Integrative view of our multi-omic data for medulla and a gene-rich region on chromosome 5. Bulk RNA sequencing (RNA-seq) and ATAC-seq tracks are an overlay from all 3 donors. The single-nucleus ATAC (snATAC) signal is derived from thick ascending limb cells (TALs), the most abundant cell type in the human kidney medulla, and is shown with its predicted cis-coaccessibility networks (CCANs), which link potentially coregulated accessible regions (see Methods). Furthermore, CTCF binding sites and topologically associated domains (TADS; different hierarchy levels; Methods) derived from HI-C data of medullary kidney tissue (donor 4) are shown. It is visible that CCANs are well contained within top level (level 1) TADs for this genomic region. (d) Fraction of CCANs with a maximum TAD overlap ≥ 80% for different TAD hierarchy levels. Left: CCANs derived from proximal convoluted and straight tubule cells (PCTs/PSTs) overlapped with kidney cortex TADs. Right: respective analysis with CCANs from TAL and medullary TADs. Generally, >70% (71% cortex, 76% medulla) of CCANs are mostly (≥ 80%) contained within the boundaries of top-level TADs. (e) Density of differentially expressed genes (DEGs) and differentially accessible regions (DARs) in genomic regions close to differential chromatin loops. The 10-kb loop anchors were extended by 25 kb to both sides. Then, the density of DEGs and DARs per Mb was computed for these regions and compared with the average density of DARs and DEGs in the total human transcriptome (background). Chr, chromosome.
We next examined if these Hi-C chromosome conformation maps could be used to validate imputed chromosomal contacts derived from a previously generated single snATAC-seq dataset of the adult human kidney22 using Cicero28. This algorithm leverages co-accessible pairs of DNA elements using single-cell chromatin accessibility data in order to predict connections between regulatory DNA elements and their putative target genes. Because of the different resolution of our observed Hi-C loops and the significantly shorter Cicero connections (Supplementary Fig. 14), we asked if clusters of co-occurring Cicero-predicted connections (a cis-coaccessibility network, CCAN) would be constrained by larger scale chromosome organizational units such as topologically associated domains (TADs)43. In other words, we asked if CCANs would be wholly localized within TADs, as has been shown for most enhancer contacts to their target gene44. We used a hierarchical TAD-caller for our analysis, OnTAD29, which nests smaller TADs within larger TADs. Examining a gene-rich 9.5 Mb region of chromosome 5 in the medulla datasets (Fig. 4c), we recognized that CCANs predicted in thick ascending loop (TAL) cells appeared to wholly reside within the highest-level TADs (level 1) in medullary tissue. Lower level (smaller) TADs 2 and 3 often exhibited CCANs with connections that extended beyond their boundaries. The larger size distribution of level 1 TADs compared to CCAN connections (Supplementary Fig. 15) meant that as TAD levels increased (1→2→3), the proportion of CCANs wholly contained (as defined by >80% of their connections) within those TADs decreased (Fig. 4d). These analyses demonstrated that predicted enhancer-promoter interaction CCANs generated from snATAC-seq data could be validated by TAD-level chromosome structure delineated by our Hi-C data.
Integrated analysis of gene expression, chromatin accessibility and chromosome conformation data.
To assess which of these long-range chromatin interactions/loops were connecting a DAR to a DEG, we extended the loop termination points by ±25kb at both ends to create “loop anchors”. We found that the density of DARs adjacent to significantly differential loops was similar at 1.50/Mb in cortex and 1.55/Mb in medulla whereas the transcriptome wide DAR density was only 0.31/Mb. Similarly, the DEG density adjacent to significantly differential loops was 3.92/Mb for cortex and 4.17/Mb for medulla with transcriptome wide DEG density being only 1.32/Mb (Fig. 4e). These results showed that both DARs and DEGs were highly enriched in the vicinity of cortex- and medulla-specific chromatin loops.
We then asked which of the highly concordant DEGs with DARs also demonstrated a differential loop (Fig. 5a). Even with these stringent intersecting criteria, we identified 31 genes with concordant multi-omic support at the levels of differential expression, chromatin accessibility and differential loops between cortex and medulla shared between GTEx and our own data (Fig. 5b, Supplementary Table 16). This was confirmed by direct examination of the multi-omic landscape at the exemplar medulla gene locus, CLDN14 (Fig. 5c). Many kidney trait-related GWAS variants concentrated in accessible chromatin regions close to differentially expressed genes in the medulla (e.g., for CLDN14 and WNT7B). These data confirmed that DARs and chromatin loops were correlated with DEGs and could serve to dissect the cortex and medulla-specific regulatory programs driving gene expression and genetic susceptibility to complex kidney-related traits. To facilitate exploration of any locus of interest genome-wide, these data can be visualized via an epigenome browser portal located at: http://faculty.washington.edu/shreeram/genomebrowser.html.
Figure 5|. RNA sequencing (RNA-seq), ATAC-seq, and Hi-C-seq integration.
(a) Integrative strategy combining RNA-seq, ATAC-seq, and Hi-C-seq data. Differentially expressed genes (DEGs) from our own samples were screened for differentially accessible region (DAR) enrichment in their extended regulatory region and then overlapped with the loop anchors of differential loops (DLs). (b) The approach from (a) identified 71 DEGs with DAR enrichment, all of which showed consistent log2 fold changes in the Genotype-Tissue Expression (GTEx) data (r = 0.94). A total of 31 genes additionally overlapped extended regions around DL anchors. (c) Epigenomic landscape around the CLDN14 gene (chromosome 21). CLDN14 shows higher expression in medulla (RNA-seq data of 3 donors) and multiple medullary DARs within its gene body. Differential Hi-C loops specific for medulla show a potential link to a downstream region. Multiple variants around the terminal exon of CLDN14 are associated with estimated glomerular filtration rate. Other genomic regions can be visualized accordingly and interactively with our genome browser: http://faculty.washington.edu/shreeram/genomebrowser.html.
Identification of transcription factor drivers of the medullary gene expression program.
To identify transcription factors (TFs) driving medullary gene expression, we employed a stepwise approach. We used all 661 transcription factors from the JASPAR core vertebrate’s database33 as a starting point, and found that 519 out of these (78.5%) were at least minimally expressed in kidney tissue (Fig. 6a). A much smaller set of 35 TFs were differentially expressed in medulla and cortex in both our and the GTEx data. TFs such as HNF1A, WT1, LMX1B and others with known roles in cortical gene expression showed the expected higher expression in kidney cortex and conversely, 23 TFs exhibited significantly higher expression in medulla (Fig. 6a and b). Mutations in TBX18, the TF with the most prominent medulla-specific expression pattern, have been linked to dysregulated ureter development and urinary tract malformations45. However, the roles of most of the remaining medulla-specific TFs have not been described.
Figure 6|. Identification of transcription factors (TFs) driving medullary gene expression.
(a) A total of 519 of all 661 TFs in the JASPAR 2022 CORE vertebrates’ database showed at least minimal expression in our and the Genotype-Tissue Expression (GTEx) version 8 kidney samples. The upset plot shows the number of TFs that are furthermore differentially expressed in medulla in our and/or the GTEx data (DEG_medulla, DEG_medulla_GTEx), which show significant motif enrichment in medullary differentially accessible regions (DARs) (Enr_in_medulla_DARs) and show significant medullary accessibility deviations at their motifs (Diff_accessibility_medulla). Only POU3F3 was identified across all analyses. (b) Volcano plot showing TFs that are differentially expressed between cortex and medulla in our samples (adjusted P < 0.01, log2-fold change > 1, indicated by black lines). TFs also showing consistent differential expression in the GTEx data are circled. (c) Comparison of P values from differential motif enrichment and accessibility deviations. Differential motif accessibility was computed on the basis of JASPAR motifs using ChromVar, whereas motif enrichment was computed with HOMER. Some transcription factors have 1 HOMER motif but multiple JASPAR motifs and are, therefore, shown multiple times along the y axis of the figure. (d) TFs with significant motif enrichment in ARs inside of medullary DEG promoters (−5000 bp − 1000 bp from TSS) computed using HOMER. The middle panel shows the accessibility deviations for these same TFs computed with chromVar for each of our cortical and medullary tissue samples. The right panel displays the TFs’ normalized expression for each sample. Only POU3F3 is significantly and differentially expressed.
To identify which of these TFs might be playing a role in medullary gene expression, we compared differential accessibility across all instances of each TF motif between medulla and cortex samples using chromVar32. This identified 39 TFs with significant accessibility deviations for cortex, including HNF1A, HNF1B, VDR, TFs with well-known roles in cortex gene regulation (Supplementary Fig. 16a). Conversely, chromVar identified 64 TFs with higher motif accessibility in the medulla (Fig. 6a). Next, we asked if the binding motifs of JASPAR core Vertebrates TFs were enriched in medulla over cortex DARs, or vice versa, using HOMER (Supplementary Table 17). The HOMER names of TFs with significant enrichment were mapped manually to the names used in the JASPAR CORE database for comparability with the chromVar (Supplementary Table 18). For cortex, HOMER identified 82 TFs with significant motif enrichment in cortex DARs and 73 TFs in medulla (Fig. 6a, Supplementary Fig. 16). Overall, 29 TFs were identified by both approaches for medulla (Fig. 6c) and 17 for cortex (Supplementary Fig. 16b). Finally, POU3F3 was identified as the TF with the maximum level of support across multiple medulla multi-omic datasets and analysis approaches (Fig. 6a). This stringent intersectional approach was also supported by the identification of HNF1A and VDR as the two TFs with the maximum level of multi-omic evidence in cortex datasets (Supplementary Fig. 16), both of which have well-described roles in cortex development and function. We then assessed whether corticomedullary gene expression differences could be driven by POU3F3. Indeed, POU3F3 motifs were enriched in accessible chromatin regions within the promoter region of medullary DEGs (Fig. 6d, left). Other POU family members were also enriched within these regions, as expected due to redundancy of their preferred DNA binding motifs. This observation also held for the chromVar motif accessibility differences (Fig. 6d, middle). However, when we examined the gene expression of the other POU-family TFs with similar binding motifs, only POU3F3 was found to be significantly and differentially expressed between cortex and medulla (Fig. 6d, right). This made POU3F3 the most plausible POU family member potentially regulating differentially expressed genes of the medulla.
Validation of POU3F3 expression in the adult human kidney.
Our integrative search for medullary TFs converged on POU3F3 (also known as BRN1). It has previously been implicated in fetal kidney development in mouse and human46–50, but its expression in adult human kidney medulla has not been studied. As performed successfully for studying gene expression signatures of glomerular disease in human kidney biopsies51, we used spatial transcriptomics (digital spatial profiling) on cortex and medulla sections from donors 1–3 to confirm the compartment-specific localization of POU3F3 (Supplemental Fig. 17a). Unsupervised clustering of spatial profiling data showed clear separation of the region of interest (ROI) expression signatures according to their assigned histology (Supplementary Fig. 17b). To confirm the anatomic assignment of the ROIs, we also performed cellular deconvolution of the expression data and showed that the ROIs were composed of cell types specific to their assigned histology (Fig. 7a). Marker genes associated with cortical and medullary anatomic locations were also expressed in the expected pattern in these ROIs (Supplementary Fig. 18). As predicted by our integrated analysis, POU3F3 showed a strong proximal-to-distal gradient of expression along the nephron segments (Fig. 7b).
Figure 7|. Validation of POU3F3 expression.
(a) Cellular deconvolution of our spatial transcriptomics (digital spatial profiling) data from donor 1 to 3 shows cell types specific to the histology assigned to the selected regions of interest (ROIs). Each column of the heat map depicts 1 of the selected ROIs (grouped by assigned histology), and the rows show cell types from the single-nucleus RNA-sequencing (snRNA-seq) reference used for deconvolution. The scale bar shows z-score of inferred cell type proportion. (b) Upper quartile normalized counts for POU3F3 in each ROI grouped by assigned histology. POU3F3 shows higher expression in the outer and inner stripe of the medulla compared with other regions of the kidney. (c) Immunohistochemical staining for POU3F3 in adult human kidney cortex and medulla. Bar = 50 μm. (d) Expression of POU3F3 and its putative targets in the Kidney Precision Medicine Project (KPMP) snRNA-seq data set. Tubular cell types are ordered along the nephron (red shading: cortical cell types; blue shading: medullary cell types; white if not location specific). The downstream genes were clustered hierarchically by average expression (gray shading: genes with concordant expression patterns to POU3F3). Only protein-coding genes expressed in at least 10% of the cells in at least 1 cell type were considered for this analysis. To optimize viewing of this image, please see the online version of this article at www.kidney-international.org.
Next, we confirmed that POU3F3 was expressed at the protein level in human kidney tissues using immunohistochemistry (Fig. 7c, Supplementary Fig. 19). In the cortex, only distal tubules and collecting ducts expressed POU3F3 within their cytoplasm, with stronger staining of the nucleus; cells in the glomeruli and proximal tubules did not express POU3F3. In the medulla, all tubular components expressed POU3F3 including thick and thin loops of Henle and collecting ducts. Within the medulla, cells in the interstitium and endothelial cells lining medullary capillaries did not express POU3F3.
We lastly examined the function of medulla DEGs with at least one accessible POU3F3 motif in their promoter (Supplementary Table 19) and found that this set of target genes was enriched for molecular functions associated to kidney development (Supplementary Figure 20). Correlation of POU3F3 and its target genes’ expression using the KPMP kidney single nucleus gene expression atlas with our immunohistochemistry staining profile showed that POU3F3 was preferentially expressed in tubular components of the distal nephron and medulla (Fig. 7d). Next, we selected the putative POU3F3 target genes with at least 10% expression in any tubular, endothelial or fibroblast cell type and clustered them by average expression per cell type. While some putative target genes were expressed in multiple cell types, a subset (grey shaded boxes) showed expression patterns that were concordant with POU3F3. For example, the transcription factor gene SIM1 has a similar pattern of expression as POU3F3 in the renal collecting system of the developing mouse kidney 52 and pronephros of zebrafish53. Another potential POU3F3 target gene, RANBP3L has been shown to play a role in BMP signaling pathways54, ischemia reperfusion injury55 and has been linked to regulation of blood pressure via urinary potassium excretion56. CADM1, which has a distal nephron/medullary pattern of expression overlapping with POU3F357, appears to be shed in nephropathies58 and may serve as a urinary biomarker of medullary injury59. Therefore, the analysis of genes with POU3F3 motifs within their promoter identified several medullary genes with plausible roles in kidney development and disease.
Discussion
Here we present reference-quality functional genomic datasets for human kidney cortex and medulla that will provide a valuable resource for kidney investigators. Integrating all levels of information across our own and the GTEx samples, we identified a set of 71 genes characterizing the distinct epigenomic landscapes of the human kidney cortex and medulla. Genes related to angiogenesis and extracellular matrix were enriched in the medulla. While genes related to angiogenesis may reflect the hypoxic milieu of the renal medulla, identification of extracellular matrix genes is consistent with the medulla’s distinct histologic appearance and composition during development60. These medullary genes reveal the need for more detailed investigation of the medulla and its contribution to kidney disease. For example, the medullary gene CLDN14 encodes a tight junction protein involved in paracellular calcium reabsorption61. CLDN14 is highly expressed in cells of the medullary thick ascending limb of the loop of Henle in single-nucleus RNA-seq data from human kidney11. Since common genetic variants in CLDN14 have been associated with kidney stone disease62, understanding the mechanisms underlying this susceptibility awaits future studies.
Interestingly, the gene ontology analyses also highlighted genes associated with kidney development as enriched in the medulla. This suggests that these genes also play an important role in the adult medulla, beyond a role in development. Consistent with this idea, our integrative analysis identified WNT7B, a gene crucial for medullary zone formation during embryonic stages63 but also described to enhance renal tubule repair after ischemic injury64. Our investigation of transcription factor binding motif enrichment in medulla open chromatin regions identified POU3F3, a highly conserved transcription factor47 that plays a critical role in development of the excretory system in worms65, as well as in the distal nephron of frogs66 and mice46–48. Single cell gene expression studies showed the highest expression of POU3F3 is in the developing loop of Henle in mice49 and humans50. Consistent with this pattern of expression, a missense mutation in POU3F3 in mice reduces nephron number and interferes with normal development of the thick ascending loop of Henle48. In the adult human kidney, POU3F3 is expressed in the loop of Henle and distal tubule, but also in the cortical and medullary collecting ducts11, a localization pattern slightly different from mice. Using spatial transcriptomics, we confirmed the expression of POU3F3 in a proximal-distal gradient along the nephron. We were also able to confirm POU3F3 protein expression at these sites in adult human kidney tissues by immunohistochemistry. POU3F3 target genes are enriched in kidney and urogenital development gene ontologies, and our results implicate a role for them in the functioning of the adult kidney medulla. Based on our findings, the role of POU3F3 expression in the adult human kidney medulla and its contribution to kidney disease will be an exciting area for future investigation.
The Hi-C data generated in this study represents the highest resolution architectural maps of the kidney genome available to date. Our analysis revealed differential chromatin contacts (loops) between cortex and medulla, which were enriched near active enhancer marks (H3K27Ac), differentially accessible chromatin regions (potential enhancers), and differentially expressed genes between cortex and medulla. These findings are consistent with a large body of literature in genome biology explaining how enhancers can influence the activity of their target genes even if they are hundreds of kilobases or more away: the model requires that the intervening DNA sequence is looped out, thereby permitting physical interaction between the gene promoter and its enhancer67–70. This has significant implications for the interpretation of variants identified by GWAS, that are often located in/near distal enhancers71. According to the chromatin looping model, the causal gene may not be the closest one to the index variant from GWAS. Integration of kidney-specific structural data such as the one generated in this study will be instrumental to examine whether a GWAS variant is localized in a distal enhancer and what its target gene(s) may be72.
Chromosome organization and architecture also directly influence and constrain gene expression. Cell-specific loops are known to prime gene expression responses73. We find that the kidney-specific chromosomal architecture synergizes with chromatin accessibility and gene expression studies to increase our understanding of cortex- and medulla-specific gene regulation. Moreover, we find evidence for higher-order chromosome organizational principles: topologically associated domains (TADs) are megabase-scale organizational units of chromatin, within which DNA interactions are more likely to occur74. TADs appear to organize gene regulatory modules since genetic interventions that perturb TAD architecture also disrupt normal gene expression75,76. Our observation that cis-coaccessibility networks imputed from single-nucleus ATAC-seq data appear to be constrained by TAD boundaries based on our Hi-C data is therefore reassuring, because it provides orthogonal validation of the influence of kidney-specific chromosome structure on gene regulation.
Potential limitations deserve discussion. First, a large portion of kidney tissue was required to perform the parallel gene expression, chromatin accessibility, and chromosome conformation genomic studies. This necessitated the use of kidney nephrectomy tissues from patients undergoing removal of kidney masses. All donors in our study were male, a risk factor for kidney cancer. This limits our ability to study sexual dimorphism between the cortex and medulla, which has recently been reported from animal models of acute kidney injury and repair77,78. Another potential limitation is the use of bulk profiling rather than single-cell/single-nucleus approaches. For example, the reported Hi-C data represents the statistical average of thousands of cellular chromosome conformations79. Bulk tissue-based approaches could obscure the cellular heterogeneity of the cortex and medulla, and miss rare cell types whose signatures would be diluted by more abundant cell populations. However, even with these limitations, we show that with careful histologic curation and evaluation of bulk gene expression signatures, we could accurately reassign GTEx kidney samples to cortex and medulla, followed by meaningful extraction of medulla-specific gene expression signatures. This finding may have implications for other GTEx tissues, and could be applied to those that exhibit evidence of sample misassignment80. Furthermore, we show how predictions from bulk profiling methods can be validated by spatial transcriptomics. Using a single cell reference dataset, we were able to deconvolute the histologically defined regions in our experiment into component cell types. Such approaches will allow for high-resolution molecular dissection of anatomically distinct regions of the medulla such as the outer medulla with its outer and inner stripes, as well as the inner medulla.
In summary, we present multi-omic data and approaches to study gene regulation and transcription in the medullary and cortical portion of the adult human kidney. We also provide access to these data through an epigenome browser portal to enable researchers from diverse fields to interrogate our data genome-wide (http://faculty.washington.edu/shreeram/genomebrowser.html). These maps will serve as a valuable reference for studies on kidney gene regulation, genetic susceptibility to disease including GWAS, and the contributions of the medulla to normal kidney function and disease.
Supplementary Material
Acknowledgements
S.A., G.S. and X.L. were funded in part by R01 DK130386. The University of Washington’s Digital Spatial Profiling core facility is supported by the Department of Laboratory Medicine and Pathology. The work of A.K. was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) Project-ID 192904750 – CRC 992 Medical Epigenetics and Project-ID 431984000 – CRC 1453 Nephrogenetics. The work of S.H. was funded by DFG Project-ID 431984000 – CRC 1453 Nephrogenetics. Y.L. was supported by DFG grant KO 3598/4-2 (to A.K.) and DFG project ID 390939984 - EXC-2189 CIBSS. We cordially thank Gabriele Greve for contributing valuable ideas and feedback to this project.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Disclosure
All authors have nothing to disclose.
Data sharing statement
The RNA-seq, ATAC-seq and Hi-C data supporting the findings of this study can be accessed under Gene Expression Omnibus (GEO) accession number GSE212910. A UCSC track hub of the functional genomic data generated in this study is located at https://akilesh-genome-browser.s3.us-west-2.amazonaws.com/hub.txt. These data can also be visualized via an epigenome browser portal located at: http://faculty.washington.edu/shreeram/genomebrowser.html. Digital spatial profiling data are available at GEO accession number GSE235841. We have clearly indicated each software whenever applicable and provided information on chosen options (Methods). No proprietary software was used for this study.
References
- 1.Leonhardt KO, Landes RR. Oxygen tension of the urine and renal structures. Preliminary report of clinical findings. N Engl J Med 1963;269:115–121. doi: 10.1056/NEJM196307182690301 [DOI] [PubMed] [Google Scholar]
- 2.Brezis M, Rosen S. Hypoxia of the renal medulla--its implications for disease. N Engl J Med 1995;332(10):647–655. doi: 10.1056/NEJM199503093321006 [DOI] [PubMed] [Google Scholar]
- 3.Berry MR, Mathews RJ, Ferdinand JR, et al. Renal Sodium Gradient Orchestrates a Dynamic Antibacterial Defense Zone. Cell 2017;170(5):860–874.e19. doi: 10.1016/j.cell.2017.07.022 [DOI] [PubMed] [Google Scholar]
- 4.Consortium GTEx. The Genotype-Tissue Expression (GTEx) project. Nat Genet 2013;45(6):580–585. doi: 10.1038/ng.2653 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Uhlén M, Fagerberg L, Hallström BM, et al. Proteomics. Tissue-based map of the human proteome. Science 2015;347(6220):1260419. doi: 10.1126/science.1260419 [DOI] [PubMed] [Google Scholar]
- 6.Lidberg KA, Muthusamy S, Adil M, et al. Multi-Omic Characterization of Human Tubular Epithelial Cell Response to Serum. Cell Biology; 2021. doi: 10.1101/2021.01.29.428186 [DOI] [Google Scholar]
- 7.Li Y, Cheng Y, Consolato F, et al. Genome-wide studies reveal factors associated with circulating uromodulin and its relationships to complex diseases. JCI Insight 2022;7(10):e157035. doi: 10.1172/jci.insight.157035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013;29(1):15–21. doi: 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 2014;30(7):923–930. doi: 10.1093/bioinformatics/btt656 [DOI] [PubMed] [Google Scholar]
- 10.Jew B, Alvarez M, Rahmani E, et al. Accurate estimation of cell composition in bulk expression through robust integration of single-cell information. Nat Commun 2020;11(1):1971. doi: 10.1038/s41467-020-15816-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hansen J, Sealfon R, Menon R, et al. A reference tissue atlas for the human kidney. Sci Adv 2022;8(23):eabn4965. doi: 10.1126/sciadv.abn4965 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell 2021;184(13):3573–3587.e29. doi: 10.1016/j.cell.2021.04.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15(12):550. doi: 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS: A Journal of Integrative Biology 2012;16(5):284–287. doi: 10.1089/omi.2011.0118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ramírez F, Ryan DP, Grüning B, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 2016;44(W1):W160–W165. doi: 10.1093/nar/gkw257 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhang Y, Liu T, Meyer CA, et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol 2008;9(9):R137. doi: 10.1186/gb-2008-9-9-r137 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Neph S, Kuehn MS, Reynolds AP, et al. BEDOPS: high-performance genomic feature operations. Bioinformatics 2012;28(14):1919–1920. doi: 10.1093/bioinformatics/bts277 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Yu G, Wang LG, He QY. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 2015;31(14):2382–2383. doi: 10.1093/bioinformatics/btv145 [DOI] [PubMed] [Google Scholar]
- 20.McLean CY, Bristor D, Hiller M, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 2010;28(5):495–501. doi: 10.1038/nbt.1630 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lawrence M, Huber W, Pagès H, et al. Software for Computing and Annotating Genomic Ranges. Prlic A, ed. PLoS Comput Biol 2013;9(8):e1003118. doi: 10.1371/journal.pcbi.1003118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Muto Y, Wilson PC, Ledru N, et al. Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nat Commun 2021;12(1):2190. doi: 10.1038/s41467-021-22368-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Wuttke M, Li Y, Li M, et al. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat Genet 2019;51(6):957–972. doi: 10.1038/s41588-019-0407-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Teumer A, Li Y, Ghasemi S, et al. Genome-wide association meta-analyses and fine-mapping elucidate pathways influencing albuminuria. Nat Commun 2019;10(1):4130. doi: 10.1038/s41467-019-11576-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Roayaei Ardakany A, Gezer HT, Lonardi S, Ay F. Mustache: multi-scale detection of chromatin loops from Hi-C and Micro-C maps using scale-space representation. Genome Biol 2020;21(1):256. doi: 10.1186/s13059-020-02167-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lopez-Delisle L, Rabbani L, Wolff J, et al. pyGenomeTracks: reproducible plots for multivariate genomic datasets. Bioinformatics 2021;37(3):422–423. doi: 10.1093/bioinformatics/btaa692 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Ramírez F, Bhardwaj V, Arrigoni L, et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat Commun 2018;9(1):189. doi: 10.1038/s41467-017-02525-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Pliner HA, Packer JS, McFaline-Figueroa JL, et al. Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data. Mol Cell 2018;71(5):858–871.e8. doi: 10.1016/j.molcel.2018.06.044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.An L, Yang T, Yang J, et al. OnTAD: hierarchical domain structure reveals the divergence of activity among TADs and boundaries. Genome Biol 2019;20(1):282. doi: 10.1186/s13059-019-1893-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Heinz S, Benner C, Spann N, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 2010;38(4):576–589. doi: 10.1016/j.molcel.2010.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 2009;4(8):1184–1191. doi: 10.1038/nprot.2009.97 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Schep AN, Wu B, Buenrostro JD, Greenleaf WJ. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat Methods 2017;14(10):975–978. doi: 10.1038/nmeth.4401 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Fornes O, Castro-Mondragon JA, Khan A, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Research Published online November 8, 2019:gkz1001. doi: 10.1093/nar/gkz1001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Krassowski Michał, Arts M, Lagger C. Krassowski/Complex-Upset: V1.3.3 Zenodo; 2021. doi: 10.5281/ZENODO.3700590 [DOI] [Google Scholar]
- 35.Danaher P, Kim Y, Nelson B, et al. Advances in mixed cell deconvolution enable quantification of cell types in spatial transcriptomic data. Nat Commun 2022;13(1):385. doi: 10.1038/s41467-022-28020-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006;38(8):904–909. doi: 10.1038/ng1847 [DOI] [PubMed] [Google Scholar]
- 37.Anderson CA, Pettersson FH, Clarke GM, Cardon LR, Morris AP, Zondervan KT. Data quality control in genetic case-control association studies. Nat Protoc 2010;5(9):1564–1573. doi: 10.1038/nprot.2010.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sieber KB, Batorsky A, Siebenthall K, et al. Integrated Functional Genomic Analysis Enables Annotation of Kidney Genome-Wide Association Study Loci. JASN 2019;30(3):421–441. doi: 10.1681/ASN.2018030309 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Siebenthall KT, Miller CP, Vierstra JD, et al. Integrated epigenomic profiling reveals endogenous retrovirus reactivation in renal cell carcinoma. EBioMedicine 2019;41:427–442. doi: 10.1016/j.ebiom.2019.01.063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lidberg KA, Muthusamy S, Adil M, et al. Serum Protein Exposure Activates a Core Regulatory Program Driving Human Proximal Tubule Injury. J Am Soc Nephrol 2022;33(5):949–965. doi: 10.1681/ASN.2021060751 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Rao SSP, Huntley MH, Durand NC, et al. A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell 2014;159(7):1665–1680. doi: 10.1016/j.cell.2014.11.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wang C, Liu X, Gao Y, et al. Reprogramming of H3K9me3-dependent heterochromatin during mammalian embryo development. Nat Cell Biol 2018;20(5):620–631. doi: 10.1038/s41556-018-0093-4 [DOI] [PubMed] [Google Scholar]
- 43.Lieberman-Aiden E, van Berkum NL, Williams L, et al. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome. Science 2009;326(5950):289–293. doi: 10.1126/science.1181369 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Phillips-Cremins JE, Sauria MEG, Sanyal A, et al. Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 2013;153(6):1281–1295. doi: 10.1016/j.cell.2013.04.053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Xu J, Nie X, Cai X, Cai CL, Xu PX. Tbx18 is essential for normal development of vasculature network and glomerular mesangium in the mammalian kidney. Developmental Biology 2014;391(1):17–31. doi: 10.1016/j.ydbio.2014.04.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Nakai S, Sugitani Y, Sato H, et al. Crucial roles of Brn1 in distal tubule formation and function in mouse kidney. Development 2003;130(19):4751–4759. doi: 10.1242/dev.00666 [DOI] [PubMed] [Google Scholar]
- 47.Kumar S, Rathkolb B, Kemter E, et al. Generation and Standardized, Systemic Phenotypic Analysis of Pou3f3L423P Mutant Mice. PLoS One 2016;11(3):e0150472. doi: 10.1371/journal.pone.0150472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Rieger A, Kemter E, Kumar S, et al. Missense Mutation of POU Domain Class 3 Transcription Factor 3 in Pou3f3L423P Mice Causes Reduced Nephron Number and Impaired Development of the Thick Ascending Limb of the Loop of Henle. PLoS One 2016;11(7):e0158977. doi: 10.1371/journal.pone.0158977 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Cao J, Spielmann M, Qiu X, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 2019;566(7745):496–502. doi: 10.1038/s41586-019-0969-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Cao J, O’Day DR, Pliner HA, et al. A human cell atlas of fetal gene expression. Science 2020;370(6518):eaba7721. doi: 10.1126/science.aba7721 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Smith KD, Prince DK, Henriksen KJ, Nicosia RF, Alpers CE, Akilesh S. Digital spatial profiling of collapsing glomerulopathy. Kidney Int 2022;101(5):1017–1026. doi: 10.1016/j.kint.2022.01.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Caruana G, Cullen-McEwen L, Nelson AL, et al. Spatial gene expression in the T-stage mouse metanephros. Gene Expr Patterns 2006;6(8):807–825. doi: 10.1016/j.modgep.2006.02.001 [DOI] [PubMed] [Google Scholar]
- 53.Serluca FC, Fishman MC. Pre-pattern in the pronephric kidney field of zebrafish. Development 2001;128(12):2233–2241. doi: 10.1242/dev.128.12.2233 [DOI] [PubMed] [Google Scholar]
- 54.Chen F, Lin X, Xu P, et al. Nuclear Export of Smads by RanBP3L Regulates Bone Morphogenetic Protein Signaling and Mesenchymal Stem Cell Differentiation. Mol Cell Biol 2015;35(10):1700–1711. doi: 10.1128/MCB.00121-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Ke P, Qian L, Zhou Y, et al. Identification of hub genes and transcription factor-miRNA-mRNA pathways in mice and human renal ischemia-reperfusion injury. PeerJ 2021;9:e12375. doi: 10.7717/peerj.12375 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Li C, He J, Chen J, et al. Genome-Wide Gene-Potassium Interaction Analyses on Blood Pressure: The GenSalt Study (Genetic Epidemiology Network of Salt Sensitivity). Circ Cardiovasc Genet 2017;10(6):e001811. doi: 10.1161/CIRCGENETICS.117.001811 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Nagata M, Sakurai-Yageta M, Yamada D, et al. Aberrations of a cell adhesion molecule CADM4 in renal clear cell carcinoma. Int J Cancer 2012;130(6):1329–1337. doi: 10.1002/ijc.26160 [DOI] [PubMed] [Google Scholar]
- 58.Kato T, Hagiyama M, Takashima Y, Yoneshige A, Ito A. Cell adhesion molecule-1 shedding induces apoptosis of renal epithelial cells and exacerbates human nephropathies. Am J Physiol Renal Physiol 2018;314(3):F388–F398. doi: 10.1152/ajprenal.00385.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Hagiyama M, Nakatani Y, Takashima Y, et al. Urinary Cell Adhesion Molecule 1 Is a Novel Biomarker That Links Tubulointerstitial Damage to Glomerular Filtration Rates in Chronic Kidney Disease. Front Cell Dev Biol 2019;7:111. doi: 10.3389/fcell.2019.00111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lipp SN, Jacobson KR, Hains DS, Schwarderer AL, Calve S. 3D Mapping Reveals a Complex and Transient Interstitial Matrix During Murine Kidney Development. J Am Soc Nephrol Published online April 19, 2021:ASN.2020081204. doi: 10.1681/ASN.2020081204 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Gong Y, Himmerkus N, Plain A, Bleich M, Hou J. Epigenetic Regulation of MicroRNAs Controlling CLDN14 Expression as a Mechanism for Renal Calcium Handling. JASN 2015;26(3):663–676. doi: 10.1681/ASN.2014020129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Thorleifsson G, Holm H, Edvardsson V, et al. Sequence variants in the CLDN14 gene associate with kidney stones and bone mineral density. Nat Genet 2009;41(8):926–930. doi: 10.1038/ng.404 [DOI] [PubMed] [Google Scholar]
- 63.Yu J, Carroll TJ, Rajagopal J, Kobayashi A, Ren Q, McMahon AP. A Wnt7b-dependent pathway regulates the orientation of epithelial cell division and establishes the cortico-medullary axis of the mammalian kidney. Development 2009;136(1):161–171. doi: 10.1242/dev.022087 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Kawakami T, Ren S, Duffield JS. Wnt signalling in kidney diseases: dual roles in renal injury and repair. J Pathol 2013;229(2):221–231. doi: 10.1002/path.4121 [DOI] [PubMed] [Google Scholar]
- 65.Mah AK, Armstrong KR, Chew DS, et al. Transcriptional regulation of AQP-8, a Caenorhabditis elegans aquaporin exclusively expressed in the excretory system, by the POU homeobox transcription factor CEH-6. J Biol Chem 2007;282(38):28074–28086. doi: 10.1074/jbc.M703305200 [DOI] [PubMed] [Google Scholar]
- 66.Cosse-Etchepare C, Gervi I, Buisson I, et al. Pou3f transcription factor expression during embryonic development highlights distinct pou3f3 and pou3f4 localization in the Xenopus laevis kidney. Int J Dev Biol 2018;62(4–5):325–333. doi: 10.1387/ijdb.170260RL [DOI] [PubMed] [Google Scholar]
- 67.Bulger M, Groudine M. Looping versus linking: toward a model for long-distance gene activation. Genes Dev 1999;13(19):2465–2477. doi: 10.1101/gad.13.19.2465 [DOI] [PubMed] [Google Scholar]
- 68.Blackwood EM, Kadonaga JT. Going the distance: a current view of enhancer action. Science 1998;281(5373):60–63. doi: 10.1126/science.281.5373.60 [DOI] [PubMed] [Google Scholar]
- 69.Bulger M, Groudine M. Functional and mechanistic diversity of distal transcription enhancers. Cell 2011;144(3):327–339. doi: 10.1016/j.cell.2011.01.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, Gregor T. Dynamic interplay between enhancer-promoter topology and gene activity. Nat Genet 2018;50(9):1296–1303. doi: 10.1038/s41588-018-0175-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Maurano MT, Humbert R, Rynes E, et al. Systematic Localization of Common Disease-Associated Variation in Regulatory DNA. Science 2012;337(6099):1190–1195. doi: 10.1126/science.1222794 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Mumbach MR, Satpathy AT, Boyle EA, et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat Genet 2017;49(11):1602–1612. doi: 10.1038/ng.3963 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Jin F, Li Y, Dixon JR, et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 2013;503(7475):290–294. doi: 10.1038/nature12644 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Dixon JR, Selvaraj S, Yue F, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 2012;485(7398):376–380. doi: 10.1038/nature11082 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Lupiáñez DG, Kraft K, Heinrich V, et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 2015;161(5):1012–1025. doi: 10.1016/j.cell.2015.04.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Hafner A, Park M, Berger SE, Murphy SE, Nora EP, Boettiger AN. Loop stacking organizes genome folding from TADs to chromosomes. Mol Cell 2023;83(9):1377–1392.e6. doi: 10.1016/j.molcel.2023.04.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Chen L, Chou CL, Yang CR, Knepper MA. Multiomics Analyses Reveal Sex Differences in Mouse Renal Proximal Subsegments. J Am Soc Nephrol 2023;34(5):829–845. doi: 10.1681/ASN.0000000000000089 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Dixon EE, Wu H, Muto Y, Wilson PC, Humphreys BD. Spatially Resolved Transcriptomic Analysis of Acute Kidney Injury in a Female Murine Model. JASN 2022;33(2):279–289. doi: 10.1681/ASN.2021081150 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Finn EH, Pegoraro G, Brandão HB, et al. Extensive Heterogeneity and Intrinsic Variation in Spatial Genome Organization. Cell 2019;176(6):1502–1515.e10. doi: 10.1016/j.cell.2019.01.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Nieuwenhuis TO, Yang SY, Verma RX, et al. Consistent RNA sequencing contamination in GTEx and other data sets. Nat Commun 2020;11(1):1933. doi: 10.1038/s41467-020-15821-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Inker LA, Eneanya ND, Coresh J, et al. New Creatinine- and Cystatin C-Based Equations to Estimate GFR without Race. N Engl J Med 2021;385(19):1737–1749. doi: 10.1056/NEJMoa2102953 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
RNA-seq, ATAC-seq and Hi-C data can be accessed under Gene Expression Omnibus (GEO) accession number GSE212910 (reviewer token: kfyfaokwdnwdpmj). A UCSC track hub of the functional genomic data generated in this study is located at https://akilesh-genome-browser.s3.us-west-2.amazonaws.com/hub.txt. These data can also be visualized via an epigenome browser portal located at: http://faculty.washington.edu/shreeram/genomebrowser.html. Digital spatial profiling data are available at GEO accession number GSE235841 (reviewer token: kjqvckgapxkrnan).
The RNA-seq, ATAC-seq and Hi-C data supporting the findings of this study can be accessed under Gene Expression Omnibus (GEO) accession number GSE212910. A UCSC track hub of the functional genomic data generated in this study is located at https://akilesh-genome-browser.s3.us-west-2.amazonaws.com/hub.txt. These data can also be visualized via an epigenome browser portal located at: http://faculty.washington.edu/shreeram/genomebrowser.html. Digital spatial profiling data are available at GEO accession number GSE235841. We have clearly indicated each software whenever applicable and provided information on chosen options (Methods). No proprietary software was used for this study.