Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Sep 2.
Published in final edited form as: Kidney Int. 2019 Feb 27;95(4):787–796. doi: 10.1016/j.kint.2018.11.028

Representation and relative abundance of cell-type selective markers in whole-kidney RNA-Seq data

Jevin Z Clark 1, Lihe Chen 1, Chung-Lin Chou 1, Hyun Jun Jung 1, Jae Wook Lee 2, Mark A Knepper 1
PMCID: PMC7466803  NIHMSID: NIHMS1622653  PMID: 30826016

Abstract

Bulk-tissue RNA-Seq is increasingly being used in the study of physiological and pathophysiological processes in the kidney; however, the presence of multiple cell types in kidney tissue complicates data interpretation. We addressed the question of which cell types are represented in whole-kidney RNA-Seq data in order to identify circumstances in which bulk-kidney RNA-Seq can be successfully interpreted. We carried out RNA-Seq in mouse whole kidneys and in microdissected renal tubule segments. To aid in the interpretation of the data, we compiled a database of cell-type selective protein markers for 43 cell types believed to be present in kidney tissue. The whole-kidney RNA-Seq analysis identified transcripts corresponding to 17,742 genes, distributed over 5 orders of magnitude of expression level. Markers for all 43 curated cell types were detectable. Analysis of the cellular makeup of mouse and rat kidney, calculated from published literature, suggests that proximal tubule cells account for more than half of the mRNA in a kidney. Comparison of RNA-Seq data from microdissected proximal tubules with data from whole kidney supports this view. RNA-Seq data for cell-type selective markers in bulk-kidney samples provide a valid means to identify changes in minority-cell abundances in kidney tissue. Because proximal tubules make up a substantial fraction of whole-kidney samples, changes in proximal tubule gene expression can be assessed presumptively by bulk-kidney RNA-Seq, although results could potentially be complicated by the presence of mRNA from other cell types.

Keywords: bulk tissue, proximal tubule, transcriptome


RNA-Seq is a method for identifying and quantifying all mRNA species in a sample, as well as many noncoding RNA species.13 Like reverse transcriptase-polymerase chain reaction, the first step of RNA-Seq is reverse transcription of all mRNAs to give corresponding cDNAs. However, unlike reverse transcriptase-polymerase chain reaction, which amplifies only one cDNA target, RNA-Seq amplifies all cDNAs in the sample through use of adaptors that are ligated to the ends of each cDNA.4 The readout for RNA-Seq uses next-generation DNA sequencers to identify specific sequences that map to each mRNA transcript coded by the genome of a particular species (the “transcriptome”). This process allows counting of the number of “reads” for each transcript as a measure of the total amount of each transcript in the original sample. Thus RNA-Seq can be viewed simplistically like quantitative reverse transcriptase-polymerase chain reaction but more expansive and unbiased.1 The abundance of a given transcript is assumed to be proportional to the number of independent sequence “reads” normalized to the annotated exon length of each individual gene and to the total reads obtained for a sample. This calculation yields transcripts per million (TPM).5

RNA-Seq has been used increasingly in recent years, in part because of the ease of execution and the availability of next-generation DNA sequencers.6 Because of the existence of private-sector biotechnology companies, even small laboratories can successfully carry out RNA-Seq studies in lieu of quantitative reverse transcriptase-polymerase chain reaction. Many recent reports using RNA-Seq use “bulk-tissue RNA-Seq” in which complex tissues containing multiple cell types are analyzed. The limitation of this approach is that it is usually impossible to determine which cell types in the mixture are responsible for observed changes in mRNA abundances. Furthermore, strong responses in minority cell types may be masked by a lack of response in more abundant cell types.7 Similar limitations apply to other analytical modalities, such as proteomics.

A solution to this problem in kidney is to isolate specific cell types using renal tubule microdissection prior to small-sample RNA-Seq as described by Lee et al.8,9 All 14 renal tubule segments plus glomeruli have been profiled in this way. In structures that contain more than one cell type, transcriptomes of each cell type can be determined using single-cell RNA-Seq (scRNA-Seq).1017 However, RNA-Seq in single tubules or single cells is not always feasible, for example, in pathophysiological models or biopsy samples when inflammation or fibrosis limits tissue dissection or single-cell dissociation. In this context, we ask the question, “Despite the existence of multiple cell types in bulk-kidney samples, what information about specific cell types can be gleaned from whole-kidney RNA-Seq?”

RESULTS

What mRNA species are detectable in whole-kidney (WK) RNA-Seq analysis?

We carried out RNA-Seq analysis in 3 WK samples from untreated 2-month-old male C57BL/6 mice. Supplementary Figure S1 shows that the percentage of uniquely mapped reads exceeded 85% of the total reads, indicating high-data quality for all 3 samples. Total reads for each of the 3 samples exceeded 66 million reads. Figure 1 shows the reads that mapped to selected genes expressed over a broad range of TPM levels. It can be seen that faithful, selective mapping to exons was obtained down to a TPM value of about 0.15 in this study, or an expression rank of 17742. For example, the reads for Oxtr, coding for the oxytocin receptor (TPM = 0.15), thought to be expressed selectively in macula densa cells,18 are clearly mapped only to exons of the OXTR gene, indicating the specificity of the measurement for spliced Oxtr mRNA (see Supplementary Dataset S1 for mapping of reads for other transcripts with TPM around 0.15). In contrast, exon-specific mapping is ambiguous for Epo, the transcript that codes for erythropoietin (TPM = 0.09). Overall, we conclude that 17742 transcripts out of approximately 21,000 protein-coding genes in the mouse genome can be detected and quantified in WK samples with the technical approach used here. The WK TPM values for all transcripts down to rank 17,742 are presented at a publicly accessible Web page (https://hpcwebapps.cit.nih.gov/ESBL/Database/MouseWK/) and as Supplementary Dataset S2. Mapping of whole-kidney RNA-Seq reads on a genome browser can be viewed by clicking on “UCSC Genome Browser” on the Web page. Because the data in this article were obtained exclusively from 2-month-old male C57BL/6 mice, the reader is cautioned about possible differences that may occur on the basis of sex, age, mouse strain, animal species, and food intake, for example. Further studies will be needed to identify the effects of these variables.

Figure 1 |. Visualization of the RNA-Seq reads for representative transcripts.

Figure 1 |

Cell-type selective genes from indicated cell types with their mRNA length, transcripts per million (TPM), and rank values. Genes with TPM greater than 0.15 are within a confident detectable range. Data were visualized in the University of California, Santa Cruz, Genome Browser. Vertical axis shows read counts. Map of exon/intron organization of each gene is shown on top of individual panels.

What cell types are represented in whole-kidney RNA-Seq data?

Based on a variety of data types (see the Methods section), we curated a list of 43 cell types that are thought to exist in the kidney and representative protein markers that have been claimed to be specific to or selective for these cell types. The cell types, the markers, and WK TPM values for mRNAs corresponding to the markers are presented in Supplementary Dataset S3 and at a permanent, publicly available Web page (https://hpcwebapps.cit.nih.gov/ESBL/Database/MouseWK/WKMarkers.html). Selected values are presented in Tables 1 and 2. Table 1 shows TPM values for selected markers of epithelial cell types, and Table 2 shows TPM values for selected markers of nonepithelial cell types. As seen in Table 1, markers for each epithelial cell type are highly expressed with the exception of macula densa cells. The TPM values for many nonepithelial cell type markers are above the TPM = 0.15 threshold previously defined (Table 2 and Supplementary Dataset S3). Overall, based on the markers that we have curated, we conclude that mRNAs from at least 43 cell types are detectable in WK RNA-Seq samples from mouse, which includes various blood-borne cells, stromal cells, and endothelial cells.

Table 1 |.

Selected markers for renal epithelial cells in mouse whole kidney with corresponding TPM and rank valuesa

Cell type Gene symbol Common name TPM Rank
Podocyte Nphs2 Podocin 53.3 2768
Proximal (SI) Slc5a2 Type 2 Na-glucose cotransporter (SGLT2) 621.2 230
Thin ascending limb Clcnka Chloride channel, voltage sensitive, kidney type A 60.4 2511
Thick ascending limb Slc12a1 Type 2 Na-K-2Cl cotransporter (NKCC2) 333.8 470
Macula densa Ptgs2 Prostaglandin-endoperoxide synthase 2 (COX2) 0.3 16278
Distal convoluted tubule Slc12a3 Thiazide-sensitive Na-Cl cotransporter (NCC) 179.1 892
Connecting tubule Calb1 Calbindin 1 316.1 499
Principal cell Aqp2 Aquaporin-2 464.1 317
Intercalated cell, type A Slc4a1 Chloride-bicarbonate transporter 1 (AE1) 17.4 5905
Intercalated cell, type B Slc26a4 Pendrin 39.6 3484
Inner medullary collecting duct cell Slc14a2 Urea channel, epithelial 20.1 5484
Transitional epithelium Upk1a Uroplakin 1a 7.4 8472

TPM, transcripts per million.

a

The full marker dataset values are listed in Supplementary Dataset S3.

Table 2 |.

Selected markers for renal nonepithelial cells in mouse whole kidney with corresponding TPM and rank valuesa

Cell type Gene symbol Common name TPM Rank
Basophil Cd69 Cd69 antigen 0.2 16835
B lymphocyte (follicular) Cd22 B-cell receptor 0.2 17615
Dendritic cell Adgre1 Adhesion G protein-coupled receptor E1 (F4/80) 2.7 11123
Endothelial cell Pecam1 Platelet/endothelial cell adhesion molecule 1 16.8 6021
Fibroblast Pdgfrb Platelet-derived growth factor receptor, beta 7.8 8347
Granular cell of afferent arteriole Ren1 Renin 1 111.3 1454
Macrophage Cd68 Macrosialin 4.9 9629
Monocyte Cd14 Monocyte differentiation antigen CD14 5.3 9436
Neuronal cell (axon only) Stx1a Syntaxin 1A (brain) 0.5 14797
Smooth muscle cell Acta2 Actin, alpha 2, smooth muscle 40.5 3418
Polymorphonuclear leukocyte Csf3r Colony-stimulating factor 3 receptor (granulocyte) 0.2 16597
T lymphocyte Cd4 T-cell surface glycoprotein CD4 0.5 14893

TPM, transcripts per million.

a

The full marker dataset values are listed in Supplementary Dataset S3.

How much do various kidney tubule cell types contribute to TPM values?

Table 3 shows an accounting of the relative contributions of various renal epithelial cell types to the total makeup of the rat and mouse renal tubule in terms of cell number and protein mass. The estimates for rat and mouse were established by integrating several data sources relevant to quantitative renal anatomy.1922 Full calculations and data sources are available in Supplementary Dataset S4. Values for percentages of cells and protein mass for individual cell types are very similar for mice and rats, and we concentrate on rat values here. Proximal tubule cells account for roughly 52% of the estimated 206 million tubule epithelial cells per kidney. However, they account for approximately 69% of total tubule protein mass by virtue of their large size compared with other renal tubule cells (Table 3). The second largest contribution is from the thick ascending limb of Henle, contributing 17% of cells and 12% of total protein (Table 3). If mRNA levels parallel protein levels, the contribution of proximal tubules to total mRNA in the renal tubule is also likely to be considerably greater than 50%.

Table 3 |.

Contributions of epithelial cell types to whole kidney cell count and mass in rat and mouse

Total cells per kidney, millions % of total cells Total protein mass, μg % of total protein mass
Segment/cell typea Rat Mouse Rat Mouse Rat Mouse Rat Mouse
S1 proximal 48.36 10.19 23.52 20.08 33031 8189 31.23 29.80
S2 proximal 48.36 10.19 23.52 20.08 33031 8189 31.23 29.80
S3 Proximal 10.75 2.26 5.23 4.46 7340 1820 6.94 6.62
tDL type 1 4.05 1.17 1.97 2.30 1497 432 1.42 1.57
tDL type 2 3.31 0.44 1.61 0.86 815 108 0.77 0.39
tDL type 3 1.81 0.73 0.88 1.43 537 215 0.51 0.78
tAL 3.00 0.87 1.46 1.72 741 215 0.70 0.78
mTAL 17.73 7.55 8.62 14.87 7125 3033 6.74 11.04
cTAL 17.77 3.61 8.64 7.12 5103 1038 4.82 3.78
DCT 19.90 3.78 9.68 7.45 8459 1607 8.00 5.85
CNT cell 6.99 2.66 3.40 5.24 2011 764 1.90 2.78
CNT A-IC 1.17 0.44 0.57 0.87 335 127 0.32 0.46
CNT B-IC 3.50 1.33 1.70 2.62 1005 382 0.95 1.39
CCD PC 4.15 1.31 2.02 2.57 881 277 0.83 1.01
CCD B-IC 1.58 0.50 0.77 0.98 334 105 0.32 0.38
CCD A-IC 1.43 0.45 0.70 0.89 304 96 0.29 0.35
OMCD PC 3.99 1.21 1.94 2.39 1013 308 0.96 1.12
OMCD A-IC 2.50 0.76 1.22 1.50 635 193 0.60 0.70
OMCD B-IC 0.16 0.05 0.08 0.10 41 12 0.04 0.04
IMCD 5.15 1.25 2.50 2.46 1544 374 1.46 1.36
Sum 205.63 50.74 - - 105782 27484 - -

A-IC, type A intercalated cells; B-IC, type B intercalated cells; CCD, cortical collecting duct; CNT, connecting tubule; cTAL, cortical thick ascending limb of the loop of Henle; DCT, distal convoluted tubule; IMCD, inner medullary collecting duct; mTAL, medullary thick ascending limb of the loop of Henle; OMCD, outer medullary collecting duct; PC, principal cell; PT, proximal tubule; tDL, thin descending limb; tAL, thin ascending limb of the loop of Henle.

The following sources were used for the calculations: Sperber,22 Murawski et al.,19 Knepper et al.,34 Garg et al.,20 and Vandewalle et al.21 All of the calculations and sources are also available in Supplementary Dataset S4.

Nishizono et al.23 have quantified the cell types that make up the glomerulus in rats, yielding a median value of 133 podocytes per glomerulus. In each rat kidney there are 38,000 glomeruli per rat kidney × 133 podocytes per glomerulus = 5.1 × 106 podocytes per rat kidney. This value is about 2.4% of the total number of epithelial cells in the rat (Table 3). In Bertram et al.24 a somewhat larger estimate of the number of podocytes per rat glomerulus was obtained (about 181 per glomerulus), which would predict that podocytes make up 3.4% of total epithelial cells (Table 3). The number of podocytes per mouse kidney is smaller (about 75 per glomerulus),25 which would give 20,220 glomeruli per mouse kidney × 75 podocytes per glomerulus = 1.5 × 106 podocytes per mouse kidney. This comes out to 3.0% of total epithelial cells in mouse kidney (Table 3). Thus changes in podocyte transcripts are unlikely to be readily detectable or quantifiable in WK samples unless they are specific to the glomerulus. Qiu et al.26 have described an effective means of obviating this limitation via separate analysis of glomeruli microdissected from kidney samples.

What fraction of mouse WK mRNA is derived from proximal tubule cells, thick ascending limb cells, and collecting duct principal cells?

Because the proximal tubule makes such a large contribution to total epithelial cell number and protein mass (Table 3), it seems possible that WK RNA-Seq measurements could be used as a surrogate for measurements of transcript levels in the proximal tubule. To compare the mouse WK transcriptome with that of the mouse proximal tubule, we carried out RNA-Seq in microdissected S2 proximal tubules, manually dissected from the opposite (left) kidney from the one used for WK RNA-Seq analysis. The S2 segment was chosen, rather than S1 or S3, because it is rapidly dissectible without collagenase treatment and clearly identifiable because of its presence in the cortical medullary rays. The S2 proximal data mapped to a total of 18,767 genes with mean TPM values greater than 0.1 among the 3 animals. All of the 12 S2 proximal samples (4 replicates per kidney) had a percentage of uniquely mapped reads greater than 85, consistent with high-data quality (Supplementary Figure S2). The mean TPM values are provided as a publicly accessible Web page at https://hpcwebapps.cit.nih.gov/ESBL/Database/MusRNA-Seq/index.html. Figure 2a and b show plots of the base 2 logarithms of the WK versus proximal S2 TPM values for housekeeping and nonhousekeeping genes, respectively. The list of housekeeping genes was taken from Lee et al.8 The ratios for all genes were normalized such that the average WK/S2 TPM ratio is 1 for housekeeping genes that have TPM greater than 1. A tight correlation was seen for housekeeping transcripts (Figure 2a). As expected, WK/S2 ratios varied over a broad range for nonhousekeeping transcripts. The lower bound is seen at a ratio of about 0.25 and coincides with the location of S2-specific transcripts, for example, Slc22a7 and Slc22a13, which mediate organic anion and organic cation secretion, respectively—key functions of the S2 segment.27 This finding suggests that the S2 segment accounts for approximately one-quarter of WK mRNA. Kap, a proximal tubule marker expressed in all 3 subsegments (S1, S2, and S3) is found near the 0.5 ratio line, suggesting that the proximal tubule may account for roughly 50% of WK mRNA.

Figure 2 |. Correlation between whole-kidney RNA-Seq and microdissected single-tubule RNA-Seq.

Figure 2 |

(a) Housekeeping genes were plotted for whole-kidney RNA-Seq versus microdissected proximal tubule S2 RNA-Seq. (b–d) Nonhousekeeping genes were plotted for whole-kidney RNA-Seq versus the indicated microdissected single tubule RNA-Seq. The dashed lines represent the whole-kidney versus respective tubule RNA-Seq ratios. For (b), each dot is an individual transcript with transcripts per million (TPM) greater than 0.15. Data are log2-transformed before plotting. CCD, cortical collecting duct; cTAL, cortical thick ascending limb of the loop of Henle.

TPM values for microdissected mouse cortical thick ascending limbs and cortical collecting ducts were mined from a prior study10 and compared with the WK RNA-Seq data from this article (Figures 2c and d). The lower bound of values for cortical thick ascending limbs corresponds to known thick ascending limb markers (Umod, Slc12a1, and Ppp1r1b) just below the ratio 1:8 line. The specific ratios for these markers give an estimate that thick ascending limbs account for roughly 8.8 percent of the total kidney mRNA. This estimate contrasts with a value of about 15% based on morphometric analysis in mouse (medullary thick ascending limb of the loop of Henle plus cortical thick ascending limbs; Table 3), possibly due to dilution of the WK values by nonepithelial cells not accounted for in the morphometric analysis. The lower bound for cortical collecting duct cells corresponds to known principal cell markers (Aqp2, Aqp3, and Fxyd4) at a ratio of around 1 to 32, suggesting that principal cells account for around 3% of the WK transcriptome.

What is the contribution of nonepithelial cell types to the overall bulk kidney transcriptome?

Given the estimates of the percent contribution of each epithelial cell type in Table 3 and RNA-Seq data from microdissected tubules from rat kidney,8 it is possible to calculate a “reconstructed” bulk kidney transcriptome. This transcriptome can be compared with rat WK RNA-Seq data from our laboratory (Gene Expression Omnibus No. GSE70012). The difference between the two can be attributed to nonrenal tubule cell types and is presented in Table 4 and Supplementary Dataset S5 in the form of measured: reconstructed ratios. As seen in Table 4, this analysis in rat confirms the presence of several nonrenal tubule cell types in bulk kidney tissue and establishes the listed markers as detectible in normal rat kidneys.

Table 4 |.

Transcripts highly expressed in rat whole kidney but not in renal tubule epitheliaa

Marker gene symbol Annotation Measured whole kidney (FPKM) Reconstructed whole kidney (RPKM) Measured: reconstructed ratio Putative cell type
Cd84 SLAM family member 5 1.71 0.00 22991.78 B lymphocyte
Upk3a Uroplakin 3A 2.28 0.00 18854.78 Transitional epithelium
Upk1b Uroplakin 1B 1.27 0.00 1636.54 Transitional epithelium
Pdgfrb Platelet-derived growth factor receptor, beta 18.23 0.01 1339.82 Fibroblast/mesangial cell
Cd34 CD34 antigen 20.97 0.03 608.42 Endothelial cell
Col1A1 Collagen, type I, alpha 1 34.85 0.16 211.25 Fibroblast
Ngf Nerve growth factor 0.75 0.01 90.39 Neuronal cell (axon only)
Thy1 Thymus cell antigen 1, theta 4.86 0.07 64.76 Mesangial cell
Serpine2 Serine (or cysteine) peptidase inhibitor, E2 8.35 0.17 50.07 Mesangial cell
Mcam Melanoma cell adhesion molecule 10.93 0.34 31.93 Pericyte
Nos1 Nitric oxide synthase 1, neuronal 0.85 0.03 28.30 Macula densa cell
Acta2 Actin, alpha 2, smooth muscle, aorta 11.75 0.49 24.18 Myofibroblast/pericyte/smooth muscle cell
Cxcr4 Chemokine (C-X-C motif) receptor 4 2.02 0.09 23.22 Megakaryocyte
Nphs1 Nephrin 8.97 0.47 19.21 Podocyte
Upk1a Uroplakin 1A 1.97 0.10 19.07 Transitional epithelium
Fcgr2b Low-affinity Ig gamma Fc region receptor II-b 0.65 0.09 7.03 Monocyte
Cd200r1 Cell surface glycoprotein CD200 receptor 1 0.56 0.09 6.39 Macrophage

FPKM, fragments per kilobase million; RPKM, reads per kilobase million.

a

“Reconstructed whole kidney” refers to whole kidney gene expression calculated from rat single tubule RNA-Seq and estimates of the percent contribution of each renal tubule cell type. “Measured whole kidney” refers to whole kidney RNA-Seq in rats. Cell types correspond to those annotated in Supplementary Dataset S3.

Reconstructed RNA-Seq transcriptome of WK from scRNA-Seq data

Recently, several reports have provided scRNA-Seq data for many of the known renal tubule cell types.1017 In theory, single-cell transcriptomes could be used to produce reconstructed bulk-kidney transcriptomes in a manner similar to that presented in the previous section using data from microdissected renal tubules. However, the calculation requires comprehensive transcriptomes in each cell, that is, a full accounting of the abundances of all expressed transcripts, which appears to correspond to 7000 to 8000 expressed genes in each cell type.8 Figure 3a shows the average number of transcripts quantified in selected individual cell types in a recent scRNA-Seq profiling study that used a state-of-the-art droplet-based method.15 Similar values (not shown) were obtained from another recent droplet-based scRNA-Seq study of kidney.11 As can be seen, the average number of transcripts quantified was in the range of 274 to 476. Thus, although the most abundant transcripts were found, the transcriptome list does not appear to be comprehensive despite the use of state-of-the-art methodology. Furthermore, information about gene expression that can identify a particular cell type is conveyed only in nonhousekeeping genes, which constituted less than one-third of the total. As shown in Figure 3b, expression levels from comprehensive transcriptomic data sets show that the percentage of nonhousekeeping transcripts increases at transcript ranks beyond that obtained in droplet-based scRNA-Seq of kidney (shaded region). Thus a goal for the future is to increase the depth of scRNA-Seq transcriptomic analysis for all major cell types in the kidney. A strategy for achieving this goal is proposed in the Discussion section.

Figure 3 |. Sequencing depth in single-cell RNA-Seq.

Figure 3 |

(a) Average number of transcripts quantified in selected individual cell types from Park et al.15 The genes selected had a mean transcript count greater than 1 and were categorized into housekeeping and nonhousekeeping genes. The list of housekeeping genes was taken from Lee et al.8 (b) The cumulative percentage of nonhousekeeping genes are plotted versus transcripts per million rank for mouse whole-kidney transcriptome data presented in this article. The shaded region correlates to the maximum number of transcripts (476) in single-cell data as identified in (a). A-IC, type A intercalated cells; B-IC, type B intercalated cells; cTAL, cortical thick ascending limb of the loop of Henle; DCT, distal convoluted tubule; IC, intercalated cell; LOH, loop of Henle; PT, proximal tubule.

DISCUSSION

In this article we asked the question, “What information about specific cell types can be gleaned from WK RNA-Seq?” To address this question, we carried out RNA-Seq analysis of mouse WK samples, yielding a database of 17,742 transcripts with TPM values above a threshold of 0.15 (see Figure 1 and Supplementary Dataset S2). A full report of TPM values for all 17,742 transcripts is given at a publicly accessible Web site (https://hpcwebapps.cit.nih.gov/ESBL/Database/MouseWK/index.html). To identify cell types represented in these data, we compiled a list (from literature) of selective markers for 43 cell types that are likely present in kidney tissue. This information is listed in Supplemental Dataset S3. (Note that we made no attempt to make the marker list totally comprehensive. Readers are encouraged to look up other transcripts of interest at the Web site of RNA-Seq data: https://hpcwebapps.cit.nih.gov/ESBL/Database/MouseWK/index.html). We detected markers for all 43 cell types, many of them presumably rare in the overall cell count for the kidney. Thus, even for rare cell types, bulk RNA-Seq data can be used to draw inferences about the abundance of a particular cell type or regulation of its marker. For example, an inflammatory process in the kidney is likely to be associated with increases in markers for macrophages (e.g., Adgre1 [F4/80] or Cd68) in WK RNA-Seq data. Similarly, an increase in mRNA for renin in the kidney may be seen if either the number of afferent arteriolar granular cells increases or when the transcription of the renin gene is increased, both of which have been observed.28

Our analysis of the abundances of individual epithelial cell types confirms that proximal tubule cells account for a large fraction of the total kidney substance, most likely at least 50%. The S2 segment alone appears to account for approximately 25% of WK mRNA (Figure 2b). This finding raises the question of whether WK measurements suffice to assess changes in the proximal tubule. Clearly, changes in proximal tubule mRNA abundance for a particular gene should be detectable in WK samples, although the magnitude of changes will be attenuated by dilution by other cell types. The main problem with interpreting WK changes as tantamount to changes in the proximal tubule is that large changes that are specific to other segments would also be manifest in WK samples. Furthermore, changes in the proximal tubule could be masked by opposite changes in other cell types. Consequently, we do not recommend using WK or bulk-tissue RNA-Seq as the sole methodology to address hypotheses about the proximal tubule. One approach that may be better in this setting is single-tubule RNA-Seq,8 in which proximal tubules are first microdissected from the kidney and then subjected to small-sample RNA-Seq analysis. In this article, we present new single-tubule RNA-Seq data on the transcriptome of microdissected mouse S2 proximal straight tubules and present a comparison with the WK RNA-Seq data.

The compendium of cell-type selective protein markers provided in this article is a resource that may be useful to investigators. We caution that the list is not necessarily comprehensive. The list includes multiple markers that have been claimed for certain cell types, many of which were chosen because the protein is present on the cell surface, allowing cell sorting. The imprecise definition of the term “cell marker” may lead to uncertainty when interpreting different types of data, and thus cell surface markers could be suboptimal for interpretation of RNA-Seq data. Furthermore, many markers have been claimed to be cell-type specific in several cell types, contradicting the specificity claim. In general, we believe that there is a need for a kidney-community-oriented effort to define the best cell markers for various uses.

In this article we have shown that it is possible to create a “reconstructed” WK transcriptome from transcriptomes of individual renal tubule segments and obtain information about the relative abundances of each cell type in the kidney from morphometric data. Success with this exercise has helped validate the accuracy of quantitative RNA-Seq data from structures isolated from the kidney. This outcome also bodes well for establishment of the validity of scRNA-Seq measurements.1017 However, we could not carry out WK reconstructions using the state-of-the-art scRNA-Seq data that is currently available because the number of transcripts measured in these studies (274–476) fell short of the full depth of cellular transcriptomes (at least 7000–8000).8 Thus, although the scRNA-Seq data that have been published represents a very large step forward, there remains an unreached objective—that is, to push the method so that the scRNA-Seq identifies full transcriptomes for all of the major cell types. Until now, comprehensive scRNA-Seq studies have used a shotgun approach that involved digestion of the kidney and sequencing to obtain transcriptomes for all single cells obtained.11,15 A limitation of this approach is that, as confirmed in this study, proximal tubule cells are much more abundant than any other cell type in the kidney. Consequently, an unbiased sequencing of all cells results in most of the sequencing resources being devoted to proximal tubule cell transcriptomes. As a result, if investigators increase the amount of sequencing to obtain deeper transcriptomes with a shotgun approach, most of the additional effort will be wasted on proximal-tubule cells. To avoid this inefficiency, in the quest to obtain deep transcriptomes in minority cell types, it may be necessary to use microdissection, biochemical procedures, or flow sorting to isolate or enrich those cell types. Already, scRNA-Seq studies have been reported in which this strategy was used for components of the glomerulus12 and the collecting duct.10

Beyond this reconstruction approach, there is potential value in being able to work in the opposite direction to “deconvolute” bulk-tissue data,29 for example, in the analysis of formalin-fixed paraffin-embedded kidney biopsy samples,30,31 to ascertain what cell types are present in the samples and how they are altered by disease processes. This process can succeed qualitatively by identifying cell-type-specific transcripts that differ in abundance in a patient sample versus some appropriate reference. However, a difference in a particular transcript could be due to either a change in the number of cells or a change in the expression of the marker in each cell. The use of multiple markers may help resolve this ambiguity. In the long term, machine learning techniques can be used to generate classifiers from bulk RNA-Seq data that can identify disease processes.32

SUMMARY

RNA-Seq is being used increasingly to assess gene expression in the kidney. To discover pathophysiologic mechanisms in animal models of kidney disease, RNA-Seq is often carried out in bulk kidney tissue, which consists of multiple cell types. This study analyzes RNA-Seq data from whole kidneys from normal mice and rats to identify the cell types represented in the data. Markers for 43 different cell types were clearly detectible, including all epithelial cell types plus multiple types of vascular cells, stromal cells, and bone-marrow-derived cells. However, proximal tubule cells appear to account for half or more of total renal mRNA. Despite limitations created by the presence of multiple cell types, bulk-kidney RNA-Seq can be interpretable, particularly when changes in cell-type-specific markers are observed.

METHODS

Animals

Two-month-old male C57BL/6 mice (Taconic Biosciences, Hudson, NY) were maintained in standard conditions with free access to food and water. All animal experiments were conducted in accordance with National Institutes of Health animal protocol H-0047R4.

Microdissection

Mice were killed via cervical dislocation. The right kidney was rapidly removed and, after removal of the capsule, was immediately transferred to TRIzol reagent for RNA extraction. The left kidney was placed in ice-cold dissection solution (135 mM NaCl, 1 mM Na2HPO4, 1.2 mM MgSO4, 5 mM KCl, 2 mM CaCl2, 5.5 mM glucose, 5 mM N-2-hydroxyethylperazine-N’−2-ethanesulfonic acid, 5 mM Na acetate, 6 mM alanine, 1 mM trisodium citrate, 4 mM glycine, 1 mM heptanoate, pH 7.4) for microdissection. Cortical collecting ducts, cortical thick ascending limbs, and proximal tubule S2 segments were manually dissected in ice-cold dissection solution without protease treatment under a Wild M8 dissection stereomicroscope (Wild Heerbrugg, Heerbrugg, Switzerland) equipped with on-stage cooling. These segments are clearly identifiable because of its presence in the cortical medullary rays. After a thorough wash in ice-cold phosphate-buffered saline solution (2 times), the microdissected tubules were transferred to TRIzol reagent for RNA extraction. One to 4 tubules were collected for each sample.

WK RNA-Seq and single-tubule RNA-Seq

These steps were conducted as previously reported.10 Briefly, total RNA from WK and microdissected proximal tubules were extracted using a Direct-zol RNA MicroPrep kit (Zymo Research, Irvine, CA) and cDNA was generated with use of a SMARTer V4 Ultra Low RNA kit (Takara Bio USA, Mountain View, CA) according to the manufacturer’s protocols. One ng cDNA was fragmented and labeled with a bar code using a Nextera XT DNA Sample Preparation Kit (Illumina, San Diego, CA). Libraries were generated by polymerase chain reaction amplification, purified by AmPure XP magnetic beads (Beckman Coulter, Indianapolis, IN), and quantified using a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA). Library size distribution was determined using an Agilent 2100 bioanalyzer with a high-sensitive DNA kit (Agilent Technologies, Wilmington, DE). Libraries were pooled and sequenced (paired-end 50 bp) on an Illumina Hiseq 3000 platform to an average depth of 60 million reads per sample.

Data processing and transcript abundance quantification

Data processing was performed as previously reported.10 Briefly, raw sequencing reads were processed by FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and aligned by STAR33 to the mouse Ensembl genome (Ensembl, GRCm38.p5) with Ensembl annotation (Mus_musculus.GRCm38.83.gtf). Unique genomic alignment was processed for alignment visualization on the University of California, Santa Cruz, Genome Browser. Transcript abundances were quantified using RSEM5 in the units of TPM. Unless otherwise specified, the calculations were performed on the National Institutes of Health Biowulf High-Performance Computing platform.

WK and proximal tubule transcriptomes

The mean TPM values were calculated across all samples: 3 mice (WK, n = 3) and (S2 proximal tubule, n = 12). These filtered data are reported on specialized publicly accessible, permanent Web pages to provide a community resource: https://hpcwebapps.cit.nih.gov/ESBL/Database/MusRNA-Seq/index.html.

Data deposition

The FASTQ sequences and metadata reported in this article have been deposited in the National Center for Biotechnology Information’s Gene Expression Omnibus database (accession number GSE111837; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111837).

Curation of list of cell-type selective genes

To identify a list of cell-type selective genes from renal tubule segments, we used data from microdissected rat renal tubules published by Lee et al.,8 as well as data from mouse microdissected tubules and single cells described by Chen et al.10 and Park et al.15 For other cell types, markers were determined using a combination of the following sources: general PubMed searches for publicly accessible research articles, commercial information sources for recommended marker antibodies, and general reference textbooks. Specific sources are given in Supplementary Dataset S3. The curated list was designed to be representative but not exhaustive.

Supplementary Material

Supplemental Fig. B

Figure S2. Mapping quality of the microdissected proximal tubule S2 RNA-Seq data. Distribution of reads shows that uniquely mapped reads exceeds 85% of total reads in all 12 S2 proximal tubule samples. Total reads were: sample 1, 69808466; sample 2, 84962667; sample 3, 75565121; sample 4, 74862689; sample 5, 76598350; sample 6, 78381995; sample 7, 70858077; sample 8, 77120838; sample 9, 64935558; sample 10, 69894298; sample 11, 70091668; sample 12, 67011247.

Supplemental Fig. A

Figure S1. Mapping quality of the whole-kidney RNA-Seq data. Distribution of reads shows that uniquely mapped reads exceeds 85% of total reads in all three whole-kidney samples. Total reads were: sample 1, 66142467; sample 2, 68482027; sample 3, 69079531.

Supplemental Dataset C

Dataset S3. Cell type selective markers: RNA-seq analysis of mouse whole-kidney.

Supplemental Dataset A

Dataset S1. Mapped reads for low expression in whole-kidney RNA-seq analysis.

Supplemental Dataset B

Dataset S2. Detectable transcipts in RNA-seq analysis of mouse whole-kidney.

Supplemental Dataset D

Dataset S4. Rat and mouse cell counts calculations and sources.

Supplemental Dataset E

Dataset S5. Measured:reconstructed rat RNA-seq ratio.

Original Submitted Manuscript

ACKNOWLEDGMENTS

The work was funded by the Division of Intramural Research, National Heart, Lung, and Blood Institute (project ZIA-HL001285 and ZIA-HL006129, MAK). Next-generation sequencing was performed in the National Heart, Lung and Blood Institute DNA Sequencing Core Facility (Yuesheng Li, Director).

Footnotes

DISCLOSURE

All the authors declared no competing interests.

Glossary. Supplementary material is linked to the online version of the paper at www.kidney-international.org.

REFERENCES

  • 1.Mortazavi A, Williams BA, McCue K, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Meth. 2008;5:621–628. [DOI] [PubMed] [Google Scholar]
  • 2.Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wu H, Humphreys BD. The promise of single-cell RNA sequencing for kidney disease investigation. Kidney Int. 2017;92:1334–1342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Brenner S, Johnson M, Bridgham J, et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol. 2000;18:630–634. [DOI] [PubMed] [Google Scholar]
  • 5.Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12: 323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010;11:31–46. [DOI] [PubMed] [Google Scholar]
  • 7.Rozenblatt-Rosen O, Stubbington MJT, Regev A, et al. The Human Cell Atlas: from vision to reality. Nature. 2017;550:451–453. [DOI] [PubMed] [Google Scholar]
  • 8.Lee JW, Chou CL, Knepper MA. Deep sequencing in microdissected renal tubules identifies nephron segment-specific transcriptomes. J Am Soc Nephrol. 2015;26:2669–2677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lee JW, Alsady M, Chou CL, et al. Single-tubule RNA-Seq uncovers signaling mechanisms that defend against hyponatremia in SIADH. Kidney Int. 2018;93:128–146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chen L, Lee JW, Chou CL, et al. Transcriptomes of major renal collecting duct cell types in mouse identified by single-cell RNA-seq. Proc Natl Acad Sci U S A. 2017;114:E9989–E9998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Han X, Wang R, Zhou Y, et al. Mapping the mouse cell atlas by Microwell-Seq. Cell. 2018;172:1091–1107.e17. [DOI] [PubMed] [Google Scholar]
  • 12.Karaiskos N, Rahmatollahi M, Boltengagen A, et al. A single-cell transcriptome atlas of the mouse glomerulus. J Am Soc Nephrol. 2018;29: 2060–2068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lu Y, Ye Y, Bao W, et al. Genome-wide identification of genes essential for podocyte cytoskeletons based on single-cell RNA sequencing. Kidney Int. 2017;92:1119–1129. [DOI] [PubMed] [Google Scholar]
  • 14.Lu Y, Ye Y, Yang Q, et al. Single-cell RNA-sequence analysis of mouse glomerular mesangial cells uncovers mesangial cell essential genes. Kidney Int. 2017;92:504–513. [DOI] [PubMed] [Google Scholar]
  • 15.Park J, Shrestha R, Qiu C, et al. Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. Science. 2018;360:758–763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wu H, Malone AF, Donnelly EL, et al. Single-cell transcriptomics of a human kidney allograft biopsy specimen defines a diverse inflammatory response. J Am Soc Nephrol. 2018;29:2069–2080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Young MD, Mitchell TJ, Vieira Braga FA, et al. Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors. Science. 2018;361:594–599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Stoeckel ME, Freund-Mercier MJ. Autoradiographic demonstration of oxytocin-binding sites in the macula densa. Am J Physiol. 1989;257: F310–F314. [DOI] [PubMed] [Google Scholar]
  • 19.Murawski IJ, Maina RW, Gupta IR. The relationship between nephron number, kidney size and body weight in two inbred mouse strains. Organogenesis. 2010;6:189–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Garg LC, Knepper MA, Burg MB. Mineralocorticoid effects on Na-K-ATPase in individual nephron segments. Am J Physiol. 1981;240:F536–F544. [DOI] [PubMed] [Google Scholar]
  • 21.Vandewalle A, Wirthensohn G, Heidrich HG, et al. Distribution of hexokinase and phosphoenolpyruvate carboxykinase along the rabbit nephron. Am J Physiol. 1981;240:F492–F500. [DOI] [PubMed] [Google Scholar]
  • 22.Sperber I. Studios on the mammalian kidney. Zoologiska Bidrag fran Uppsala. 1944;22:249–432. [Google Scholar]
  • 23.Nishizono R, Kikuchi M, Wang SQ, et al. FSGS as an adaptive response to growth-induced podocyte stress. J Am Soc Nephrol. 2017;28:2931–2945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Bertram JF, Soosaipillai MC, Ricardo SD, et al. Total numbers of glomeruli and individual glomerular cell types in the normal rat kidney. Cell Tissue Res. 1992;270:37–45. [DOI] [PubMed] [Google Scholar]
  • 25.Puelles VG, van der Wolde JW, Schulze KE, et al. Validation of a three-dimensional method for counting and sizing podocytes in whole glomeruli. J Am Soc Nephrol. 2016;27:3093–3104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Qiu C, Huang S, Park J, et al. Renal compartment-specific genetic variation analyses identify new pathways in chronic kidney disease. Nat Med. 2018;24:1721–1731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Woodhall PB, Tisher CC, Simonton CA, et al. Relationship between para-aminohippurate secretion and cellular morphology in rabbit proximal tubules. J Clin Invest. 1978;61:1320–1329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Taugner R, Hackenthal E, Nobiling R, et al. The distribution of renin in the different segments of the renal arterial tree: immunocytochemical investigation in the mouse kidney. Histochemistry. 1981;73:75–88. [DOI] [PubMed] [Google Scholar]
  • 29.Zhao Y, Simon R. Gene expression deconvolution in clinical samples. Genome Med. 2010;2:93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Eikrem O, Beisland C, Hjelle K, et al. Transcriptome sequencing (RNAseq) enables utilization of formalin-fixed, paraffin-embedded biopsies with clear cell renal cell carcinoma for exploration of disease biology and biomarker development. PLoS One. 2016;11:e0149743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Li P, Conley A, Zhang H, et al. Whole-transcriptome profiling of formalin-fixed, paraffin-embedded renal cell carcinoma by RNA-seq. BMC Genomics. 2014;15:1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Reeve J, Bohmig GA, Eskandary F, et al. Assessing rejection-related disease in kidney transplant biopsies based on archetypal analysis of molecular phenotypes. JCI Insight. 2017;2(12). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Knepper MA, Danielson RA, Saidel GM, et al. Quantitative analysis of renal medullary anatomy in rats and rabbits. Kidney Int. 1977;12:313–323. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Fig. B

Figure S2. Mapping quality of the microdissected proximal tubule S2 RNA-Seq data. Distribution of reads shows that uniquely mapped reads exceeds 85% of total reads in all 12 S2 proximal tubule samples. Total reads were: sample 1, 69808466; sample 2, 84962667; sample 3, 75565121; sample 4, 74862689; sample 5, 76598350; sample 6, 78381995; sample 7, 70858077; sample 8, 77120838; sample 9, 64935558; sample 10, 69894298; sample 11, 70091668; sample 12, 67011247.

Supplemental Fig. A

Figure S1. Mapping quality of the whole-kidney RNA-Seq data. Distribution of reads shows that uniquely mapped reads exceeds 85% of total reads in all three whole-kidney samples. Total reads were: sample 1, 66142467; sample 2, 68482027; sample 3, 69079531.

Supplemental Dataset C

Dataset S3. Cell type selective markers: RNA-seq analysis of mouse whole-kidney.

Supplemental Dataset A

Dataset S1. Mapped reads for low expression in whole-kidney RNA-seq analysis.

Supplemental Dataset B

Dataset S2. Detectable transcipts in RNA-seq analysis of mouse whole-kidney.

Supplemental Dataset D

Dataset S4. Rat and mouse cell counts calculations and sources.

Supplemental Dataset E

Dataset S5. Measured:reconstructed rat RNA-seq ratio.

Original Submitted Manuscript

RESOURCES