Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Dec 9;16(12):e0259373. doi: 10.1371/journal.pone.0259373

Variable expression of eighteen common housekeeping genes in human non-cancerous kidney biopsies

Philipp Strauss 1,*, Håvard Mikkelsen 1, Jessica Furriol 2
Editor: Stephen D Ginsberg3
PMCID: PMC8659319  PMID: 34882702

Abstract

Housekeeping, or reference genes (RGs) are, by definition, loci with stable expression profiles that are widely used as internal controls to normalize mRNA levels. However, due to specific events, such as pathological changes, or technical procedures, their expression might be altered, failing to fulfil critical normalization pre-requisites. To identify RG genes suitable as internal controls in human non-cancerous kidney tissue, we selected 18 RG candidates based on previous data and screen them in 30 expression datasets (>800 patients), including our own, publicly available or provided by independent groups. Datasets included specimens from patients with hypertensive and diabetic nephropathy, Fabry disease, focal segmental glomerulosclerosis, IgA nephropathy, membranous nephropathy, and minimal change disease. We examined both microdissected and whole section-based datasets. Expression variability of 4 candidate genes (YWHAZ, SLC4A1AP, RPS13 and ACTB) was further examined by qPCR in biopsies from patients with hypertensive nephropathy (n = 11) and healthy controls (n = 5). Only YWHAZ gene expression remained stable in all datasets whereas SLC4A1AP was stable in all but one Fabry dataset. All other RGs were differentially expressed in at least 2 datasets, and in 4.5 datasets on average. No differences in YWHAZ, SLC4A1AP, RPS13 and ACTB gene expression between hypertensive and control biopsies were detected by qPCR. Although RGs suitable to all techniques and tissues are unlikely to exist, our data suggest that in non-cancerous kidney biopsies expression of YWHAZ and SLC4AIAP genes is stable and suitable for normalization purposes.

1. Introduction

Housekeeping, or reference genes (RGs) are a group of genes involved in basic cell functions, with a presumed stable expression profile that is independent of cell type and pathophysiological conditions [1]. These RGs are widely used to normalize qPCR data, necessary for robustness and better reproducibility of the results [24]. Considering the role played by these technologies in modern research, well-documented normalization strategies are essential.

Since tissue heterogeneity as well as sample quality, isolation and reverse transcription can add variations to final data, normalization is necessary to adjust for the introduced variability. To be suitable for normalization, RG expression should not display sample variation or correlate with other variables such as treatments, physiological states, gender, age, or sex. Neither should variation occur due to biological changes associated with specific diseases [5].

However, a variety of studies indicate that the expression of several traditional RGs shows considerable variability [58]. As a consequence, conclusions drawn from experimental results can point to opposite directions depending on the RG selected for normalization [9]. Therefore, many guidelines suggest prospective testing of selected RGs under the specific conditions required for the planned experiments [2, 9].

While scientifically advantageous, the additional testing is often limited by tissue availability or budget restrictions. To a certain extent this testing can be circumvented, or at least reduced, by studies examining the variability of the RGs in similar tissues. RG testing in cancerous tissues is relatively frequent [1012], whereas it is less prevalent in non-cancerous renal diseases.

Although common RGs have been validated in diabetic nephropathy [13], various forms of glomerulopathies [7] and allograft tissues [4, 14], the expression of fewer RGs has been verified in hypertensive nephropathy, one of the most common causes of end-stage renal disease (ESRD) in Europe [15]. In recent years, several new RGs have been proposed [3, 11]. While commendable, this effort has deepened the existing problem of insufficient validation, as the newer candidates are often validated to an even lesser extent than the older, often faulty [6, 7, 11], RGs. Considering the uncertainty surrounding RGs, results from non-cancerous kidney diseases urgently require validation.

In recent years the increasing popularity of sequencing technology has resulted in the generation of numerous datasets, that can be mined for data on RGs expression, without performing costly additional experiments [4, 16, 17]. Therefore, here we selected eighteen commonly used RGs, screened them in 30 expression datasets and selected 4 to validate by qPCR in a hypertensive nephropathy and normal kidney biopsies cohort with the aim to identify RGs appropriate for the normalization of RNA data from human non-cancerous kidney samples. We believe that we have achieved that aim.

2. Materials and methods

2.1 Study design

The study was designed in accordance with MIQE guidelines [2]. RGs were selected based on the frequency of their use across all tissues and in previous investigations, with special emphasis on papers examining gene expression in non-cancerous renal tissue. A flowchart of the study design is depicted in Fig 1. Data has been made available in the GitHub data repository (https://github.com) in the repository 310590-transciptomic-data.

Fig 1. Workflow.

Fig 1

Reference genes (RGs) were selected from the literature based on frequency of use, and whether they had previously been evaluated in kidney biopsies from non-cancerous renal tissue. A selection of RGs the expression of which has only been investigated in cancer tissues has also been included. The additional datasets referenced in the last box refer to dataset 9–12.

The following RGs were selected for examination in this investigation: Glyceraldehyde 3-phosphate dehydrogenase (GAPDH, ENSG00000111640), Actin gamma 1 (ACTG1, ENSG00000184009), REL Proto-Oncogene, NF-KB Subunit (REL, ENSG00000162924), Actin beta (ACTB, ENSG00000075624), Solute carrier family 4 member 1 adaptor protein (SLC4AIAP, ENSG00000163798), Tyrosine 3-Monooxygenase/Tryptophan 5-Monooxygenase Activation Protein Zeta (YWHAZ, ENSG00000164924), Ribosomal Protein S13 (RPS13, ENSG00000110700), NOP10 Ribonucleoprotein (NOP10, ENSG00000182117), Phosphoglycerate Mutase 1 (PGAM1, ENSG00000171314), Peptidylprolyl Isomerase A (PPIA, ENSG00000196262), Glucuronidase Beta (GUSB, ENSG00000169919), TATA-Box Binding Protein (TBP, ENSG00000112592), Ribosomal Protein L13 (RPL13, ENSG00000167526), Heterogeneous Nuclear Ribonucleoprotein L (HNRNPL, ENSG00000104824), Poly (RC) Binding Protein 1 (PCBP1, ENSG00000169564), Retention In Endoplasmic Reticulum Sorting Receptor 1 (RER1, ENSG00000157916), Phospholipase A2 Group IVA (PLA2G4A, ENSG00000116711) and Beta-2-Microglobulin (B2M, ENSG00000166710).

Following selection of prospective RGs from the literature, their expression was evaluated in our own and publicly available datasets (see below).

Since a suitable RG should under no circumstances be differentially expressed in control and test samples as it is used as internal control in those groups, we utilized differential expression as a measure of stability. Four candidate genes, including those providing the best results in the 30 datasets comparison and some of the most used RGs were selected and further evaluated by qPCR.

The Regional Ethics Committee (REC) of Western Norway approved the study (REK vest 2013/553). Written informed consent was obtained from all patients whose biopsies were part of our own experiments.

2.2 Datasets

A total of 30 datasets were selected. They were acquired from our unpublished data (n = 14), publicly available datasets provided by the European Renal cDNA Bank (ERCB) [1820] (n = 2) or by the Neptune Network [21] (n = 2). Additionally, we used publicly available datasets (n = 12). A detailed overview, including references and links, of each dataset is provided in Table 1. As controls, datasets included biopsies from healthy donors (10 databases), stable allografts (3 databases) or biopsies with minimal and unspecific alterations (12 databases). In our own data normal controls were selected from a group of biopsies graded by the renal-pathologist on duty as ‘‘not containing any or only insignificant pathology”. We re-examined the biopsies histology and accessed the patients’ clinical record. Patients that later developed renal disease, kidney failure or severe autoimmune disease or showed severe proteinuria were excluded. Our own dataset’s biopsies were always taken for diagnostic purposes and therefore aimed at the kidney cortex. Biopsies with less than 50% cortex were discarded. Approximately 70% of the biopsies had 10 or more glomeruli. All microdissection was performed on the same Zeiss PALM Lasor Capture Microdissection (LCM) system (Carl Zeiss AG, Oberkochen, Germany) with consistent personal and settings for each dataset. After microdissection the samples were immediately stored at -80 degrees till rna extraction, after which they were again immediately stored at -80 degrees.

Table 1. Dataset details.

Data-Set ID Disease Source GEO accession number Seq. method Micro-dissected Compartment N Control type
1 MCD Internal N.A. NGS Yes Glomeruli 22 Healthy control
2 MN Internal N.A. NGS Yes Glomeruli 20 Healthy control
3 HT Internal N.A. NGS No N.A. 12 Healthy control
4 DIA2 Internal N.A. NGS No N.A. 12 Healthy control
5 Fabry Internal N.A. NGS Yes Glomeruli 16 Healthy control
6 Fabry Internal N.A. NGS Yes Arteries 16 Healthy control
7 Fabry Internal N.A. NGS Yes Proximal tubule 16 Healthy control
8 Fabry Internal N.A. NGS Yes Distal Tubule 16 Healthy control
9 MN ERCB N.A. MA Yes Glomeruli 69 Healthy control
10 MCD ERCB N.A. MA Yes Glomeruli 62 Healthy control
11 MN Neptune N.A. MA Yes Glomeruli 55 Healthy control
12 MCD Neptune N.A. MA Yes Glomeruli 54 Healthy control
13 Fabry Internal N.A. NGS Yes Glomeruli 16 Healthy control
14 Fabry Internal N.A. NGS Yes Arteries 16 Healthy control
15 Fabry Internal N.A. NGS Yes Proximal tubule 16 Healthy control
16 Fabry Internal N.A. NGS Yes Distal Tubule 16 Healthy control
17 MN Internal N.A. NGS Yes Glomeruli 26 MCD
18 MN_PLA2R_neg Internal N.A. NGS Yes Glomeruli 12 MN_PLA2R_pos
19 RPGN GEO GSE104954 MA Yes Tubulointerstitial 39 Healthy control
20 MCD GEO GSE104954 MA Yes Tubulointerstitial 26 Healthy control
21 FSGS GEO GSE104954 MA Yes Tubulointerstitial 25 Healthy control
22 DIA GEO GSE104954 MA Yes Tubulointerstitial 25 Healthy control
23 DIA GEO GSE104954 MA Yes Tubulointerstitial 30 HT
24 HT GEO GSE104954 MA Yes Tubulointerstitial 52 Lupus
25 DIA GEO GSE104954 MA Yes Tubulointerstitial 35 IGAN
26 HT GEO GSE104948 MA Yes Glomeruli 42 Healthy control
27 IgA GEO GSE104948 MA Yes Glomeruli 42 Healthy control
28 TCMR GEO GSE120495 NGS No N.A. 10 STA
29 ATI GEO GSE120495 NGS No N.A. 10 STA
30 IFTA GEO GSE120495 NGS No N.A. 10 STA

MCD: Minimal change disease, MN: Membranous nephropathy, HT: Hypertension, DN: Diabetes type 2, FSGS; Focal segmental glomerulosclerosis, IGAN; IgA nephropathy, TCMR: t-cell mediated rejection, RPGN; Rapidly progressive glomerulonephritis, STA: stable allograft, ATI: acute tubular injury, IFTA: Interstitial fibrosis and tubular atrophy. GEO; Gene Expression Omnibus, NGS: Next generation sequencing, MA: Microarray

A total of 5 datasets (n = 54 samples) included whole kidney tissues. Moreover, since microdissection allows refining of input tissue and might reveal differences buried under noise in whole-sections, 25 datasets (n = 764 samples) included microdissected tissues from glomeruli, arteries, proximal or distal tubules, and tubointerstitial structures. In all datasets comparisons were only made within the dataset, we did not compare groups from one dataset to groups from another dataset, and in microdissected datasets we only compared the same compartments from different patient groups, e.g., hypertensive glomeruli compared to glomeruli from healthy controls, all from the same dataset.

A total of 13 datasets were sequenced via microarray and 17 via next generation sequencing. In particular, 5 datasets included samples from patients with minimal change disease (MCD); 8 from patients with Fabry disease; 5 from patients with membranous nephropathy (MN); 3 from patients with hypertensive nephropathy (HN), and 4 from patients with diabetic nephropathy (DN). Full details on each patient cohort from external data is available through the original publication for each external dataset, see Table 1. In internal datasets, patients suffering, at the time of the initial biopsy from concurrent renal failure, cancers or other renal diseases, apart from the primary diagnosis were excluded. All patients were Caucasian. Apart from the Fabry derived datasets all patients were over 18 years old. Across datasets genders approximately equally distributed, with more males present in the Fabry data.

2.3 Patient selection for qPCR

Kidney biopsies used for qPCR analysis (n = 16) were selected from the Norwegian Renal Biopsy Registry. Biopsies from patients with hypertensive nephropathy (HT) (n = 11) were compared to normal biopsies or samples with minimal and unspecific changes (n = 5). HT patients were matched to the non-diseased controls (NDC) for age (-/+ 5 years), and sex. Each sample was diagnosed and scored by an experienced renal pathologist. Furthermore, all cases were reassessed prior to inclusion in the study.

Average age was 54 ± 5.5 years old for NDC and 56 ± 4.6 years old for HT patients. HT patients with renal tissue alterations attributable to a different disease were not included.

All biopsies were stored as formalin-fixed and paraffin-embedded (FFPE) tissues at room temperature.

2.4 RNA isolation and cDNA synthesis

Two to eight 10 μm thick sections were cut from FFPE blocks and used as input. The number of sections was determined by the surface area covered by tissue in each biopsy. RNA was then isolated as previously described [22], using miRNeasy FFPE kit (cat no. 217504; Qiagen, Venlo, The Netherlands) according to manufacturer’s instructions.

Following extraction, samples were stored at -80°C. RNA concentration was measured with a Qubit RNA BR Assay kit (cat no. Q10210; ThermoFisher) in a Qubit 4 Fluorometer (Q33238; ThermoFisher). The median concentration was 59,8 ng total rna (range 22,6–242). A260/A280 and 260/230 ratios were measured using a NanoDrop One Spectrophotometer (ThermoFisher), with a median of 1,905 (range 1,67–1,98) and 1,85 (range 1,01–2,11) respectively. cDNA synthesis was performed from 200 ng of RNA using SuperScript IV VILO master mix with ezDNase (No. 11766050; Thermo Fisher Scientific).

2.5 Quantitative real-time polymerase chain reaction

Quantitative real-time polymerase chain reaction (qPCR) was performed using TaqMan Fast Advanced master mix (No. 4444556; Thermo Fisher Scientific). Technical triplicates were fulfilled for each sample and probe.

The following probes purchased from Thermo Fisher Scientific were used; RPS13 (Catalog number: 4331182, Hs01011487_g1), YWHAZ(Catalog number: 4331182, Hs01122445_g1), SLC4AIAP (Catalog number: 4331182, Hs00250835_m1), ACTB (Catalog number: 4331182, Hs03023943_g1).

Experiments were performed according to manufacturer’s instructions. qPCR was performed on a 7500 fast real-time PCR system (Applied Biosystems, Carlsbad, CA, USA). The instrument was set to Uracil-N glycosylase incubation at 50°C for 2 minutes followed by Polymerase activation at 95°C for 2 minutes. PCR was then performed for 40 cycles with denaturation at 95°C for 1 second and annealing/extension at 60°C for 20 seconds. Amplification of each RG was tested in three technical replicates for each sample and negative controls without templates were included in every experiment.

2.6 Statistical analysis

Fold changes for the 30 unpublished and publicly available datasets were calculated for the complete data in the R environment, version 1.3.1056, and p-values adjusted with the Benjamini-Hochberg method.

The number of datasets where an RG was differentially expressed in control and test samples were tallied and RGs with the lowest number picked as top candidates. The lowest number was zero, i.e. The RG was not differentially expressed in any dataset. Plots were generated using SPSS (v.25; IBM Corp., Armonk, NY, USA). Correlations were determined using Pearson test and continuous variables for age, and categorical variables for gender and sample group. Significance and p-values from the qPCRs were obtained using the Mann–Whitney U test according to ΔCt values from each sample. Cutoff for significance was set at p<0.05.

2.7 Library preparation and Bioinformatics for all datasets

Datasets acquired from ERCB (9 and 10) or the Neptune cohort (11 and 12) were processed as previously described [1821].

Datasets from our own group concerning patients suffering from Fabry’s disease (n = 8; datasets 5–8 and 13–16) were obtained as follows: RNA sequencing libraries were prepared using standard Illumina Access protocol (RNA exome, Illumina, San Diego, CA, USA) on an Illumina platform in different batches due to the large number of samples, at the following genomic facilities: i) the Norwegian University of Science and Technology (NTNU) in Trondheim, Norway, in collaboration with PhD Vidar Beisvåg and his group, ii) Firalis SA, Huningue, France, in collaboration with Eric Schordan, and iii) the Functional Genomics Center Zurich (CHRO), University of Zurich, Switzerland. However, library normalization was performed exclusively at the Norwegian University of Science and Technology, and libraries were normalized to 2.2 pM for the NextSeq500 instrument and 2.3 pM for the HiSeq 4000 instrument.

Samples were subjected to paired-end 2x75 bp sequencing with around 60M paired end reads. Base calling was done on the HiSeq instrument by RTA 1.17.21.3. FASTQ files were generated using bcl2fastq v2.20 (Illumina, Inc. San Diego, CA, USA). Transcript expression values were generated by quasi alignment using Salmon (http://salmon.readthedocs.io/en/latest/index.html) and Ensembl (GRCh38) human transcriptomes. Aggregation of transcript to gene expression was performed using tximport (http://bioconductor.org/packages/release/bioc/html/tximport.html). An empirical expression filter was applied, which left genes with more than 1 counts per million (cpm) in more than 25% of samples per dataset. Comparative analysis was done using voom/Limma R-package.

Differential gene expression in control and test samples was defined as Benjamini-Hochberg adjusted p-value ≤0.05, and an absolute fold change of ≥2. Based on unsupervised clustering and PCA correlation analysis, potential batch effects within the RNAseq data were mitigated using ComBat in combination with CPM-normalization [23]. Subsequently, using a standard DESeq2 workflow, differential gene expression was assessed to compare all groups from the same compartment [24].

Our own datasets concerning Minimal change disease (n = 1, no. 1) and Membranous nephropathy (n = 3, no. 2 and no. 17–18) were processed as follows: RNA library preparation was performed using the TruSeq RNA Access Library Preparation Kit (Illumina, Inc., San Diego, CA, USA). NextSeq500 system (Illumina, Inc., San Diego, CA, USA) was used for RNA sequencing at the Genomics Core Facility, Norwegian University of Science and Technology (NTNU). Assembly of reads was aligned to the Homo sapiens hg38 reference genome using Gencode (https://www.gencodegenes.org/) [25]. Differentially expressed genes (DEGs) with a count per million (CPM) of more than 3 in at least four samples and an absolute fold-change value of greater than 2 and adjusted p-value <0.05 were included in the analysis. Statistical analysis was performed with Limma/Voom package [26].

Sequencing libraries for the diabetic and hypertensive nephropathy datasets from our own group (datasets 3–4) were generated using the TruSeq RNA exome library kit (Illumina, San Diego, CA, USA) according to manufacturers’ instructions. Libraries were quantitated by qPCR using the KAPA library quantification kit–Illumina/ABI Prism (Kapa Biosystems, Wilmington, MA, USA) and validated using the Agilent high-sensitivity DNA kit on a bioanalyser. They were subsequently normalized to 2.6 pM and subjected to cluster and paired-end read sequencing, performed for 2× 75 cycles on two NextSeq500 HO flow cells (Illumina), according to manufacturer’s instructions. Base-calling was performed using the NextSeq500 instrument, and RTA 2.4.6. FASTQ files were generated using bcl2fastq2 conversion software (v.2.17; Illumina). Assembled reads were aligned to the Homo sapiens hg38 reference genome using Gencode (gencodes.org). Differentially expressed genes (DEGs) with >3 counts per million (CPM) in at least four samples, absolute fold-change (FC) value >2, and adjusted p-value <0.05 were included in the analysis.

Datasets 19–30 were obtained through the Gene Expression Omnibus (GEO). In particular, datasets 19–25 corresponding to GSE104954 [27] were analyzed using the GEO2R analysis tool [28, 29] provided by GEO. Datasets 26–27, corresponding to GSE104948, were used as normalized data. Similarly, for datasets 28–30, corresponding to GSE120495, we used normalized data provided by original authors [4]. Additional details are provided in Table 1.

3. Results

3.1 Reference gene expression variability

Comparison of the 30 different kidney-related gene expression datasets, showed that among commonly used RGs, SLC4AIAP and YWHAZ were more consistently expressed in control and test samples (Fig 2). In particular, YWHAZ gene was not differentially expressed in any dataset, whereas SLC4AIAP was differentially expressed in controls and test specimens in one dataset (no. 14) including microdissected arteries from patients with longstanding Fabry disease.

Fig 2. Results for all included reference genes from each dataset.

Fig 2

MCD: Minimal change disease, MN: Membranous nephropathy, HT: Hypertension, DIA2: Diabetes type 2, FSGS; Focal segmental glomerulosclerosis, IGAN; IgA nephropathy, TCMR: t-cell mediated rejection, RPGN; Rapidly progressive glomerulonephritis, STA: stable allograft, ATI: acute tubular injury, IFTA: Interstitial fibrosis and tubular atrophy. In the columns under the gene IDs “Yes” refers to genes differentially expressed in control and test samples in the dataset. “No” refers to RG equally expressed in control and test samples in the defined dataset. Not available (NA) refers to RG not tested in specific datasets. Not detected (ND) refers to genes undetected in the specific dataset.

Excluding the two top contenders, the number of available datasets showing evidence of variable RG expression in control and test samples ranged between 2/26 (12%) for PPIA and 8/26 (31%) for HNRNPL (Figs 2 and 3A).

Fig 3. Variations in reference gene (RG) expression per gene and per database.

Fig 3

Panel 3A displays RGs along the x-axis and datasets along the y-axis. For each RG the number of datasets where the RG was either not detected, not available, differentially expressed or not differentially expressed is marked. Not available (NA) refers to RGs not tested in specific datasets. Not detected (ND) refers to genes undetected in the specific dataset. If a particular variable is not listed its value was zero, such as e.g., the number of datasets were YWHAZ was differentially expressed is not listed, since YWHAZ was stable in all datasets. Panel 3B displays Datasets along the x-axis and RGs on the y-axis. For each database how many RGs were either not detected, not available, differentially expressed or not differentially expressed is marked. Characteristics of each database are described in detail on Table 1.

On the other hand, notably, YWHAZ and SLC4AIAP gene expression was undetectable in 3/30 (10%) and 5/21 (24.8%) available databases, respectively. Databases from non-microdissected libraries including stable allograft tissues, as controls, appeared to be peculiarly concerned, as neither YWHAZ nor SLC4AIAP were detected in any of the three datasets that fulfilled these criteria (Dataset 28–30, see S1 Table). However, dataset 28–30 originated from the same experiment and are not independent from each other.

3.2 Reference gene expression variability in specific datasets

Expression of the RG under investigation was analyzed in each dataset. In 10/30 datasets expression of different tested RG did not show any variation between control and test samples.

However, in the remaining 20 datasets, the expression of 6–56% of the available RG under investigation varied (Fig 3B).

Importantly, variation rates did not appear to be obviously associated with defined types of sample preparation, disease or controls. For instance, in databases addressing gene expression in microdissected samples from patients with Fabry disease (n = 8), variations in RG expression ranged between 0 (n = 2) and 56% (n = 1) (Figs 2 and 3B). Similarly, RG expression variations in membranous nephropathy databases ranged between 0 (n = 1) and 25% (n = 1). The commonly used RG GAPDH was differentially expressed only in 3/8 Fabry disease datasets. Full results including foldchanges and pvalues for each dataset for each RG are provided in S1 Table.

3.3 qPCR

To validate results from available databases, we examined the expression of YWHAZ and SLC4A1AP, the best candidate RGs, in FFPE-derived specimens from patients with hypertension (HT, n = 11) and non-diseased controls (NDC, n = 5). As control RG, we used ACTB and RPS13 genes (Table 2). Median A260/A280 ratio of the RNA samples was 1.88 (range 1.67–1,98) consistent with a good quality of the RNA output.

Table 2. Candidate RG for PCR validation.

Gene name Ensembl Full name Biological process Probes
SLC4A1AP ENSG00000163798 Solute Carrier Family 4 Member 1 Adaptor Protein RNA splicing Hs00250835_m1
ACTB ENSG00000075624 Actin Beta Actin filament fragmentation Hs03023943_g1
RPS13 ENSG00000110700 Ribosomal Protein S13 Translation Hs01011487_g1
YWHAZ ENSG00000164924 Tyrosine 3-Monooxygenase/Tryptophan 5-Monooxygenase Activation Protein Zeta Signal transduction Hs01122445_g1

Expression levels of the four RG in combined test and control samples were comparable (Fig 4A). More importantly, the expression of each RG did not significantly differ between HT and control specimens (Fig 4B). Corresponding p-values are reported in Table 3A. Moreover, the expression of the four candidate RGs appeared to be highly correlated (≥0,899; p<10−6) (Table 3B).

Fig 4. qPCR Ct values for selected reference genes.

Fig 4

A) qPCR cycles for all 16 samples. B) qPCR cycles for hypertensive (HP; n = 11) and Non-diseased controls (NDC; n = 5). None of the displayed results was significant at the p<0,05 level. The y-axis displays Ct values directly. HT = hypertensive group, NDC = non-diseased controls. Data are represented as Mean±SD.

Table 3. Comparison of the expression of each reference gene in HT and non-diseased control biopsies.

3a displays the pvalues from the qPCR experiments. Data were analyzed by Mann-Whitney Asymp. Sig. (2-tailed). None of the comparisons yielded statistically significant results. 3b shows the fold change (FC) differences and Pearson’s correlation in the expression of selected references genes. Fold changes are represented as mean±SD log FC.

Table 3a.
Reference Gene Candidates RPS13 ACTB SLC4A1AP YWHAZ
p-value 0,336 0,282 0,336 0,336
Table 3b.
RPS13 ACTB SLC4A1AP YWHAZ
RPS13 1 3,973±1,708 0,948** 5,743±0,628 0,899** 2,827±0,731 0,968**
ACTB 1 1,771±1,858 0,933** 1,146±1,207 0,959**
SLC4A1AP 1 2,916±0,927 0,922**
YWHAZ 1

**p-val<0,00001.

4. Discussion

In this study we investigated the expression of 18 commonly used RGs in 30 datasets including samples from patients with a wide range of renal diseases other than cancer, aiming at the identification of genes allowing appropriate RNA data normalization.

Our main finding is that using any single RG in the analysis of different databases implies the risk of introducing large experimental bias.

We found that YWHAZ represents a top RG, with no differences in expression between samples in all datasets where the expression data were available.

The importance of stable RGs can be demonstrated by comparing the results from using stable vs unstable RGs in the same experiment. In a theoretical example, if we were interested in the expression of PON1 in Fabry’s disease, we could perform qPCR to assess the difference between patients with Fabry’s disease and healthy controls. In our data PON1 was not affected in Fabry’s disease (Fabry vs Normal FC: 0.97). However, if we were to choose GAPDH (Fabry vs Normal FC 0.49) as RG we would have to conclude that PON1 is overexpressed in Fabry’s disease, as the GAPDH gene itself is significantly decreased in patients with Fabry disease. Therefore, the normalization will leave PON1 expression artificially higher in the Fabry group, while being decreased normally in the normal controls. If, on the other hand, we use YWHAZ as the RG, the results change. YWHAZ (Fabry vs Normal FC: 1.05) is stable in Fabry’s disease, no bias is introduced, and the results show that PON1 is not differentially expressed.

YWHAZ encodes a highly conserved protein mediating signal transduction by binding to phosphoserine-containing proteins. It was recently proposed as a ‘‘central hub protein for many signal transduction pathways” in a variety of cancers [30], and has been described as unfavorable prognostic marker in renal cancer (https://www.proteinatlas.org/ENSG00000164924-YWHAZ/pathology) [31]. These data suggest that, while YWHAZ might be suitable as a RG in non-cancerous renal tissues, caution is warranted on applying it to renal cancer tissues, as previously proposed [32]. In non-cancerous renal tissue, suppression of YWHAZ gene expression has resulted in glomerular mesangial cell proliferation in early diabetic nephropathy in primary mouse mesangial cells [33].

SLC4A1AP, encoding a solute carrier protein, might represent an additional interesting RG candidate. However, the expression of this gene was undetectable in 5/21 available databases, thus questioning its potential relevance.

As noted previously, we are not the first to investigate RG variation in non-cancerous renal biopsies. Kidney specific investigations were performed by Schmid et al. [7] who examined the stability of GAPDH, 18S rRNA and PPIA in 165 renal biopsies from a variety of diseases. Their results for GAPDH were unfavorable, while they recommended the use of 18S rRNA and PPIA. Biederman et al. [13] also examined kidney biopsies and found ACTB and YWHAZ to be the most suitable RGs, with less favorable results for GAPDH and beta2-microglobulin, acidic ribosomal protein 36B4, and cyclophilin A. While both studies examined a large pool of samples, they were limited by the nature of qPCR compared to RNA-seq, e.g., having to check each RG individually instead of having access to all sequenced transcripts and the lack of available sequencing data from different renal diseases, which were not available at time.

It is interesting to note that non-microdissected datasets appear to yield less differentially expressed genes compared to the microdissected datasets. However, the microdissected dataset also boasted a considerably larger number of patients, on average, in each dataset, compared to the non-microdissected datasets. In non-microdissected data “noise” from larger compartments might mute differential expression of specific RGs in defined compartments. Therefore, the discrepancy between datasets in differentially expressed RGs might be due to the larger number of patients and nature and quality of samples. The data from the Fabry dataset especially, yielded many differentially expressed RGs. In particular GAPDH, SLC4A1AP, PPIA and ACTG1 were only differentially expressed in the Fabry datasets. However, the Fabry datasets were also the only ones including microdissected arteries and differentiating proximal from distal tubules, whereas other datasets referred to either glomeruli or whole tubulointerstitium samples. Thus, the number of differentially expressed RGs might simply reflect true differences that are normally concealed in datasets based on less discriminating whole-section based sequencing.

Methodologies used to study RGs’ expression such as microarray, RNAseq or qPCR might produce skewed results, when compared to each other, due to biases intrinsically associated to defined technologies. A contraindicative argument against the mentioned statement could be represented by the largely concurrent expression of defined RG, such as GAPDH [7, 11, 13]. However, already in a study from 2003, based on the analysis of 165 microdissected renal biopsies obtained from a variety of diseases, Schmid et al. showed that GAPDH, though historically frequently used [7], displays a remarkable variety in its expression level and is thus not suitable as an RG in renal tissues, as also shown in studies on renal cell carcinoma [34, 35].

A similar case of concurrent results between independent sequencing and qPCR data could made for YWHAZ, which proved one of the most suitable RGs investigated in this study and yielded similar results in a separate investigation into microdissected diabetic glomeruli [13].

However, while some results obtained by sequencing and qPCR do concur, others do not. In their study leveraging the massive data contained in The Cancer Genome Atlas (TCGA), Jihoon Jo et al. [11] discarded most of the historically used RGs, such as GAPDH or ACTB and identified and confirmed by qPCR several new RGs. However, some of their proposed RGs, HNRNPL, PCBP1 and RER1 appear to be differentially expressed in several of our own datasets. A possible explanation could reside in the focus of this study on cancerous tissue [11]. This again shows that caution should be taken in using RGs validated in one type of tissue, or even just a different disease type, and using them in a different type or disease. Jihoon Jo et al. leveraged an enormous number of samples, but since they were not from non-cancerous renal tissue their results do not apply to that tissue, even though they examined renal biopsies.

Another question regarding RGs and these two techniques is whether an RG suitable for qPCR is also suitable for sequencing via microarray or next-generation sequencing techniques.

An additional level of complexity might not only be related to ‘‘true” variability of the levels of defined gene expression, but also to insufficiently specific measurement methods. Veres‑Szekely et al. [6] demonstrated that primer specificity is crucial when using ACTB as an RG. Unspecific primers might erroneously attach to α-SMA gene, which is upregulated in Fibroproliferative diseases. As kidney disease and failure are frequently associated with the presence of fibrotic tissue, this might represent an important issue.

Variation of RG expression was previously investigated in a variety of renal cell lines and in renal biopsies from malignant or non-cancerous tissues [7, 13, 34, 35]. However, our study takes advantage of the access to a large number of different datasets, both our own and from independent groups, including samples from over 10 common renal diseases and both microdissected and non-microdissected biopsies. Moreover, although not representing an exhaustive list of all RGs that have been, or are, in use, our selection covers a broad range of genes, including older, frequently used RGs, and newer, more recently proposed, candidates. In addition, we further supported our results by performing our own qPCR experiments solely focused on recording RG variability in HT biopsies.

Limitations of our study should also be acknowledged. First, although we comparatively analyzed 30 datasets, 8 were from patients with Fabry’s disease. This may have placed an undue influence on the expression of our selected RGs in Fabry disease compared to more common causes of renal failure, such as hypertension. Additionally, several datasets included relatively few patients. Also, we did not distinguish between results garnered from datasets with large, compared to small, populations. Lastly, our cohort for PCR validation was relatively small. However, the acquisition of kidney biopsies, especially from healthy patients, is not as easy as the acquisition from cancerous tissue during, e.g., nephrectomy. Especially as the procedure is not without risk to the patients’ health.

5. Conclusion

Our analysis suggests that RG suitable to all techniques and tissues do not exist and that they must be carefully selected according to the characteristics of available specimens. Even microdissected tissues might require a separate RG for each compartment, as previously proposed [36]. In non-cancerous kidney biopsies however, we propose that expression of YWHAZ as a stable single gene or the combination of YWHAZ and SLC4A1AP genes might be of particular interest for normalization purposes, especially in qPCR experiments.

Supporting information

S1 Table. Full results with pvalues and foldchanges.

MCD: Minimal change disease, MN: Membranous nephropathy, HT: Hypertension, DIA2: Diabetes type 2, FSGS; Focal segmental glomerulosclerosis, IGAN; IgA nephropathy, TCMR: t-cell mediated rejection, RPGN; Rapidly progressive glomerulonephritis, STA: stable allograft, ATI: acute tubular injury, IFTA: Interstitial fibrosis and tubular atrophy. In the columns under the gene IDs “Yes” refers to genes differentially expressed in control and test samples in the dataset. “No” refers to RG equally expressed in control and test samples in the defined dataset. Not available (NA) refers to RG not tested in specific datasets. Not detected (ND) refers to genes undetected in the specific dataset. Summaries and percentages are noted below each column.

(XLSX)

Acknowledgments

We are grateful to Celine C. Berthier for her valuable suggestions and assistance in data acquisition for this manuscript. We are also grateful to Giulio Spagnoli for providing language editing services.

Data Availability

Data is available at GEO (https://www.ncbi.nlm.nih.gov/gds), accession numbers: GSE104948, GSE108113, GSE104954. Additional data is available at https://github.com/pst087/310590-transciptomic-data.

Funding Statement

This project was funded by an open-project grant to Hans-Peter Marti from the Western Norwegian Health Region (Helse vest, project no. 912167). (https://helse-vest.no/en) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Butte AJ, Dzau VJ, Glueck SB. Further defining housekeeping, or "maintenance," genes Focus on "A compendium of gene expression in normal human tissues". Physiological genomics. 2001;7(2):95–6. doi: 10.1152/physiolgenomics.2001.7.2.95 [DOI] [PubMed] [Google Scholar]
  • 2.Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clinical chemistry. 2009;55(4):611–22. doi: 10.1373/clinchem.2008.112797 [DOI] [PubMed] [Google Scholar]
  • 3.Caracausi M, Piovesan A, Antonaros F, Strippoli P, Vitale L, Pelleri MC. Systematic identification of human housekeeping genes possibly useful as references in gene expression studies. Molecular medicine reports. 2017;16(3):2397–410. doi: 10.3892/mmr.2017.6944 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wang Z, Lyu Z, Pan L, Zeng G, Randhawa P. Defining housekeeping genes suitable for RNA-seq analysis of the human allograft kidney biopsy tissue. BMC medical genomics. 2019;12(1):86. doi: 10.1186/s12920-019-0538-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Jung M, Ramankulov A, Roigas J, Johannsen M, Ringsdorf M, Kristiansen G, et al. In search of suitable reference genes for gene expression studies of human renal cell carcinoma by real-time PCR. BMC molecular biology. 2007;8:47. doi: 10.1186/1471-2199-8-47 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Veres-Szekely A, Pap D, Sziksz E, Javorszky E, Rokonay R, Lippai R, et al. Selective measurement of alpha smooth muscle actin: why beta-actin can not be used as a housekeeping gene when tissue fibrosis occurs. BMC molecular biology. 2017;18(1):12. doi: 10.1186/s12867-017-0089-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Schmid H, Cohen CD, Henger A, Irrgang S, Schlondorff D, Kretzler M. Validation of endogenous controls for gene expression analysis in microdissected human renal biopsies. Kidney international. 2003;64(1):356–60. doi: 10.1046/j.1523-1755.2003.00074.x [DOI] [PubMed] [Google Scholar]
  • 8.Kozera B, Rapacz M. Reference genes in real-time PCR. Journal of applied genetics. 2013;54(4):391–406. doi: 10.1007/s13353-013-0173-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Caradec J, Sirab N, Keumeugni C, Moutereau S, Chimingqi M, Matar C, et al. ’Desperate house genes’: the dramatic example of hypoxia. Br J Cancer. 2010;102(6):1037–43. doi: 10.1038/sj.bjc.6605573 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Dupasquier S, Delmarcelle AS, Marbaix E, Cosyns JP, Courtoy PJ, Pierreux CE. Validation of housekeeping gene and impact on normalized gene expression in clear cell renal cell carcinoma: critical reassessment of YBX3/ZONAB/CSDA expression. BMC molecular biology. 2014;15:9. doi: 10.1186/1471-2199-15-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jo J, Choi S, Oh J, Lee SG, Choi SY, Kim KK, et al. Conventionally used reference genes are not outstanding for normalization of gene expression in human cancer research. BMC Bioinformatics. 2019;20(Suppl 10):245. doi: 10.1186/s12859-019-2809-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wierzbicki PM, Klacz J, Rybarczyk A, Slebioda T, Stanislawowski M, Wronska A, et al. Identification of a suitable qPCR reference gene in metastatic clear cell renal cell carcinoma. Tumour biology: the journal of the International Society for Oncodevelopmental Biology and Medicine. 2014;35(12):12473–87. doi: 10.1007/s13277-014-2566-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Biederman J, Yee J, Cortes P. Validation of internal control genes for gene expression analysis in diabetic glomerulosclerosis. Kidney international. 2004;66(6):2308–14. doi: 10.1111/j.1523-1755.2004.66016.x [DOI] [PubMed] [Google Scholar]
  • 14.Serinsoz E, Bock O, Kirsch T, Haller H, Lehmann U, Kreipe H, et al. Compartment-specific quantitative gene expression analysis after laser microdissection from archival renal allograft biopsies. Clinical nephrology. 2005;63(3):193–201. doi: 10.5414/cnp63193 [DOI] [PubMed] [Google Scholar]
  • 15.ANNUAL REPORT 2019 The Norwegian Renal Registry.
  • 16.Xiang Y, Ye Y, Zhang Z, Han L. Maximizing the Utility of Cancer Transcriptomic Data. Trends in cancer. 2018;4(12):823–37. doi: 10.1016/j.trecan.2018.09.009 [DOI] [PubMed] [Google Scholar]
  • 17.Cieślik M, Chinnaiyan AM. Cancer transcriptome profiling at the juncture of clinical translation. Nature reviews Genetics. 2018;19(2):93–109. doi: 10.1038/nrg.2017.96 [DOI] [PubMed] [Google Scholar]
  • 18.Cohen CD, Frach K, Schlöndorff D, Kretzler M. Quantitative gene expression analysis in renal biopsies: a novel protocol for a high-throughput multicenter application. Kidney international. 2002;61(1):133–40. doi: 10.1046/j.1523-1755.2002.00113.x [DOI] [PubMed] [Google Scholar]
  • 19.Lindenmeyer MT, Kretzler M, Boucherot A, Berra S, Yasuda Y, Henger A, et al. Interstitial vascular rarefaction and reduced VEGF-A expression in human diabetic nephropathy. Journal of the American Society of Nephrology: JASN. 2007;18(6):1765–76. doi: 10.1681/ASN.2006121304 [DOI] [PubMed] [Google Scholar]
  • 20.Schmid H, Boucherot A, Yasuda Y, Henger A, Brunner B, Eichinger F, et al. Modular activation of nuclear factor-kappaB transcriptional programs in human diabetic nephropathy. Diabetes. 2006;55(11):2993–3003. doi: 10.2337/db06-0477 [DOI] [PubMed] [Google Scholar]
  • 21.Gadegbeku CA, Gipson DS, Holzman LB, Ojo AO, Song PX, Barisoni L, et al. Design of the Nephrotic Syndrome Study Network (NEPTUNE) to evaluate primary glomerular nephropathy by a multidisciplinary approach. Kidney international. 2013;83(4):749–56. doi: 10.1038/ki.2012.428 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Eikrem O, Beisland C, Hjelle K, Flatberg A, Scherer A, Landolt L, et al. Transcriptome Sequencing (RNAseq) Enables Utilization of Formalin-Fixed, Paraffin-Embedded Biopsies with Clear Cell Renal Cell Carcinoma for Exploration of Disease Biology and Biomarker Development. PloS one. 2016;11(2):e0149743. doi: 10.1371/journal.pone.0149743 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics (Oxford, England). 2007;8(1):118–27. doi: 10.1093/biostatistics/kxj037 [DOI] [PubMed] [Google Scholar]
  • 24.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019;47(D1):D766–D73. doi: 10.1093/nar/gky955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research. 2015;43(7):e47. doi: 10.1093/nar/gkv007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Grayson PC, Eddy S, Taroni JN, Lightfoot YL, Mariani L, Parikh H, et al. Metabolic pathways and immunometabolism in rare kidney diseases. Annals of the rheumatic diseases. 2018;77(8):1226–33. doi: 10.1136/annrheumdis-2017-212935 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Davis S, Meltzer PS. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics (Oxford, England). 2007;23(14):1846–7. [DOI] [PubMed] [Google Scholar]
  • 29.Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical applications in genetics and molecular biology. 2004;3:Article3. doi: 10.2202/1544-6115.1027 [DOI] [PubMed] [Google Scholar]
  • 30.Gan Y, Ye F, He XX. The role of YWHAZ in cancer: A maze of opportunities and challenges. Journal of Cancer. 2020;11(8):2252–64. doi: 10.7150/jca.41316 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Uhlén M, Fagerberg L, Hallström BM, Lindskog C, Oksvold P, Mardinoglu A, et al. Proteomics. Tissue-based map of the human proteome. Science (New York, NY). 2015;347(6220):1260419. doi: 10.1126/science.1260419 [DOI] [PubMed] [Google Scholar]
  • 32.Villaamil VM, Vazquez-Estevez S, Campos B, Mateos LL, Fírvida JL, Ramos M, et al. GAPDH, YWHAZ, and RRN18S as control reference genes for gene expression studies on renal cell carcinoma (RCC) formaldehyde-fixed paraffin-embedded (FFPE) tissue samples. Journal of Clinical Oncology. 2011;29(7_suppl):389–. [Google Scholar]
  • 33.Zhang Z, Luo X, Ding S, Chen J, Chen T, Chen X, et al. MicroRNA-451 regulates p38 MAPK signaling by targeting of Ywhaz and suppresses the mesangial hypertrophy in early diabetic nephropathy. FEBS letters. 2012;586(1):20–6. doi: 10.1016/j.febslet.2011.07.042 [DOI] [PubMed] [Google Scholar]
  • 34.Vilà MR, Nicolás A, Morote J, de I, Meseguer A. Increased glyceraldehyde-3-phosphate dehydrogenase expression in renal cell carcinoma identified by RNA-based, arbitrarily primed polymerase chain reaction. Cancer. 2000;89(1):152–64. [PubMed] [Google Scholar]
  • 35.Révillion F, Pawlowski V, Hornez L, Peyrat JP. Glyceraldehyde-3-phosphate dehydrogenase gene expression in human breast cancer. European journal of cancer (Oxford, England: 1990). 2000;36(8):1038–42. doi: 10.1016/s0959-8049(00)00051-4 [DOI] [PubMed] [Google Scholar]
  • 36.Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome biology. 2002;3(7):Research0034. doi: 10.1186/gb-2002-3-7-research0034 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Stephen D Ginsberg

16 Jun 2021

PONE-D-21-13929

Variable expression of eighteen common housekeeping genes in human non-cancerous kidney biopsies

PLOS ONE

Dear Dr. Strauss,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration by 2 Reviewers and an Academic Editor, all of the critiques of both Reviewers, especially Reviewer #2, must be addressed in detail in a revision to determine publication status. If you are prepared to undertake the work required, I would be pleased to reconsider my decision, but revision of the original submission without directly addressing the critiques of the 2 Reviewers does not guarantee acceptance for publication in PLOS ONE. If the authors do not feel that the queries can be addressed, please consider submitting to another publication medium. A revised submission will be sent out for re-review. The authors are urged to have the manuscript given a hard copyedit for syntax and grammar.

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. 

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously? 

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In the study by Strauss et al, authors were identified suitable reference gene for non-cancerous renal tissue using available datasets and then the results were validated using renal biopsies. They confirmed the previous results that there is no single global reference gene. They also identified YWHAZ and SLC4AIAP as a suitable reference gene.

1- The quality of figures must be improved, I was not able to see the content of figure 2.

2- It seems the line numbers are misplaced on table 3a and it seems those numbers are part of the table.

Line 138. Add reference of the dataset

Line 133-134” included microdissected tissues from glomeruli, arteries, proximal or distal tubules, and tubointerstitial structures” how did authors compared these dataset together as the initial samples are different?

Line 221. “test samples were tallied and RGs with the lowest number picked as top candidates”. What is the lowest number that authors used ?

Line 301-302: “appeared to be peculiarly concerned” why is that? Elaborate more.

Figure 3a: it is not clear which are 10/30 datasets that expression of different tested RG did not show variation.

Figure 3b: those missing bar in the figure is it due to low expression level or the expression is zero? You should modify the Y axis to present this part.

Reviewer #2: The study looks to verify housekeeper genes for relative quantification of mRNA .

This is in itself an important question as having a gene to normalize against that controls for tissue input it self and is not affected by the process been studied and allows for measurement of fold expression of another mRNA species is very important. Many commonly used housekeeping genes have been poorly validated leading to erroneous conclusions regarding expression of mRNA between control and experimental groups.

Validation of housekeeper genes in huma studies is not easy as well defined true normal groups are not easy to access.

The study ask some worthwhile questions but lacks some clarity and should demonstrate the effect of different RG with different genes of interest in various tissues.

RNA seq does not use RGs and while chip analysis may do it needs to be specified when and how in terms of this study. How did the ranking and variability of the refence genes perform across the various techniques.

Some more specific questions

One presumes biopsies are done for a good reason I.e the patient has an illness. So where do the non disease control specimens come from. This needs very clear specification.

It would be helpful if this was more clearly articulated.

There is an issue with taking tissue from formalin fixed tissue for mRNA extraction rather than rapidly placing tissue into a good RNA preservation solution such as RNA or snap freezing. While it I understood there a difficulties in collecting human tissue in this way it is still the best method. One realises a number of studies have done it from formalin fixed tissue to what extent might the various RNA species degrade differently and affect the findings?

The area size of the tissue blocks is mentioned but not region of kidney tissue. This may or may not affect housekeeper genes but would highly likely affect region of nephron structures an vasculature.

406 g is a relatively large amount of RNA from a tissue sample unless relatively large.

Was this total RNA or poly mRNA .

What was the relative purity of the RNA ?

For qPCR how many plates were run and how was plate to plate variability accounted for ?

The Benjamini-Hochberg method is usually used for relatively large data sets such as those used in RNA seq and other large data sets. In analysing the qPCR data for the n=16 patients it is unclear why the various established methods of determining gene variance and hence stability within and between groups was not used. This needs to be explained and justified.

It is now generally accepted that some form of normalization of RNA seq data is required. This would be particularly true for many of the databases examined here. There is a lack of clearly indicated information on stating material. If all that is available is the broad diagnosis then should be indicated. There are varying assumptions that can be made regarding how normalization can be done and violation of the assumptions can lead to erroneous conclusions.

As this study is all about defining genes with relatively constant expression for normalization these assumptions should be spelled out and it made clear how all samples fall within the those assumptions and what might be expected such as very high and low expressing genes. (Evans et al Briefings in Bioinformatics, 19(5), 2018, 776–792) Aanes at al 2014 ) PLoS ONE 9(2): e89158.

In the description of the various RNA seq procedures there is discussion of analysis of differentially expressed gene’s and genes with greater than 2 fold expression excluded yet the paper is about genes that are hopefully minimally differentially expressed. This seems more a generic discussion of RNA seq rather than specific to this study.

As I can see RNA seq is been used as useful way to determine gene expression of a number of genes at once across a variety of samples and should in principle provide a valuable set of relative gene expression data to determine within and between group variation and thus suitability as potential housekeepers. Such an analysis seems lacking.

For the microarray data how was normalization done an what was relatively gene stability and what might have that don in assessing some variable mRNA species of interest.

I think some tables of expression data from the RNA seq and latter qPCR with a clear ranking would be helpful.

I think it would be helpful to determine what might be any experimental bias by clearly showing the effects of using some gene of interest that might change between conditions and how this can be affected by using a single or pair of particular housekeeper genes and how the results might vary between RNA seq and qPCR.

The authors should clearly show the effect of using various combinations of housekeeper genes.

Lines 412 to 414in discussion

While it is true rt-PCR requires each gene to be measured individually why would it require investigators to be more selective in diseases studied. I would suggest that it is far more likely RT PCR would be used than next gene RNA-seq based on cost and availability.

Depending on the nature and extent of disease type of tissue and region of tissue it is hardly surprising there are differences in number of genes expressed. In the kidney it is well described that there is differential gene expression along the nephron and I would expect vasculature. Another variable would be how tissue was dissected and time taken for tissue fixation and differential stability of RNA species.

Whole tissue blocs would be expected to have less variability but again will be subject to sampling and how a piece of tissue is cut. It is also reasonable that the number of patients in each data set might be important. With that in mind a full racial and demographic reporting including drug and co existent morbidities needs to be reported as many conditions might affect renal gene expression.

The authors mention possible differences in methodologies of studying reference or housekeeper genes. RNA-seq by its nature usually does not use RG’s but is dependent on sequencing and counting to normalize samples. Again it would have been useful to see some analysis of variation in the expression of suggested housekeepers by RNA-seq using the various published approaches.

The discussion could be more tightly focused.

This is about RG’s

The abstract mentions looking at RG’s patients with hypertensive nephropathy yet there is little clear data presentation o this subject or what effects different refence genes might have on assessing expression of some other gene of interest.

IT seems that some more comment might have bene made about using different reference genes for different circumstances.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Please submit your revised manuscript by December, 2021. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Stephen D. Ginsberg, Ph.D.

Section Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified whether consent was informed.

3. In your Methods section, please provide additional information about the participant recruitment method and the demographic details of your participants. Please ensure you have provided sufficient details to replicate the analyses such as: 

a) the recruitment date range (month and year), 

b) a description of any inclusion/exclusion criteria that were applied to participant recruitment, 

c) a table of relevant demographic details, 

d) a statement as to whether your sample can be considered representative of a larger population, and 

e) a description of how participants were recruited.

4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

PLoS One. 2021 Dec 9;16(12):e0259373. doi: 10.1371/journal.pone.0259373.r002

Author response to Decision Letter 0


6 Sep 2021

Bergen, September 2021

Editorial Board

PLOSONE

Dear editor and reviewers

Thank you for taking the time to review our work. We have taken your comments to heart and improved the manuscript accordingly. Below you will find a point-by-point review of the changes to the paper in response to each comment. Each point made by the reviewer is listed first, followed by a reply or explanation and reference to where in the text the changes made to the manuscript are to be found.

Reviewer #1:

In the study by Strauss et al, authors were identified suitable reference gene for non-cancerous renal tissue using available datasets and then the results were validated using renal biopsies. They

confirmed the previous results that there is no single global reference gene. They also identified YWHAZ and SLC4AIAP as a suitable reference gene.

1- The quality of figures must be improved, I was not able to see the content of figure 2.

Thanks for the suggestion. We have improved the quality of Figure 2 within the manuscript, see figure 2.

2- It seems the line numbers are misplaced on table 3a and it seems those numbers are part of the table.

We have fixed the issue.

Line 138. Add reference of the dataset

The references and the GEO numbers of the datasets are listed in table 1, we have amended the text to make this clearer, see line 137 and Table 1.

Line 133-134" included microdissected tissues from glomeruli, arteries,

proximal or distal tubules, and tubointerstitial structures" how did

authors compared these dataset together as the initial samples are

different?

We thank the reviewer for pointing out this lack of clarity, we have made changes to better explain our methodology. In short, all comparisons were kept within each dataset and within each compartment. So e.g., glomeruli from hypertensive patients from dataset 3 were only ever compared to glomeruli from other groups in dataset 3 e.g., Normotensive controls. We did not compare groups across datasets. See line 155-159.

Line 221. "test samples were tallied and RGs with the lowest number

picked as top candidates". What is the lowest number that authors used ?

The lowest number is zero, i.e. An RG which was not differentially expressed in any of the datasets examined and thus a very stable candidate. We have updated the manuscript for greater clarity, see Line 227 – 228.

Line 301-302: "appeared to be peculiarly concerned" why is that?

Elaborate more.

Thanks for pointing this out. We obtained three datasets that fulfilled the criteria mentioned in the text (non-microdissected and using stable allografts as controls) and none of the three detected YWHAZ or SLC4AIAP. Though all three datasets originated from the same experiment, so they are not independent from each other. See lines 308-310.

Figure 3a: it is not clear which are 10/30 datasets that expression of different tested RG did not show variation.

Figure 3b: those missing bar in the figure is it due to low expression

level or the expression is zero? You should modify the Y axis to present

this part.

Thanks for your very useful comment. After discussing it, we think that figures 3a and b were not clear enough and we have changed them. We hope the new figures are more representative and easier to understand, see Figure 3 and lines 322 – 332.

Reviewer #2:

The study looks to verify housekeeper genes for relative

quantification of mRNA .

This is in itself an important question as having a gene to normalize against that controls for tissue input it self and is not affected by the process been studied and allows for measurement of fold expression of another mRNA species is very important. Many commonly used housekeeping genes have been poorly validated leading to erroneous conclusions regarding expression of mRNA between control and experimental groups.

Validation of housekeeper genes in huma studies is not easy as well defined true normal groups are not easy to access. The study ask some worthwhile questions but lacks some clarity and

should demonstrate the effect of different RG with different genes of interest in various tissues.

We thank the reviewer for his comments. On the issue of using various tissues, this study examines 5 types of kidney specific tissue: whole-section, arteries, glomeruli, distal tubule, and proximal tubule. While additional types of tissue might have been interesting, our area of interest is nephrology and we do not have access to tissue banks from other organs. See line 153 - 156 and Table 1

On the issue of using ‘’different RG with different genes of interest’’: We agree that an analysis of the effect of using various RGs on theoretical genes of interest would be interesting, however, it is outside the aim of this paper. Our aim was to validate proposed RGs using available data by examining their intragroup stability, not a separate analysis of the effects of various normalization strategies on qPCR results.

RNA seq does not use RGs and while chip analysis may do it needs to be specified when and how in terms of this study.

We thank the reviewer for this observation. As the reviewer points out RNAseq does not utilize reference genes, but a section in our introduction gave that impression. We have amended that section for greater clarity, see line 55 – 56.

How did the ranking and variability of the refence genes perform across the various techniques.

This is an interesting question, we have added some information to the manuscript to better study this angle. We examined the gene expression differences with 3 techniques: RNAseq, microarray (MA) and, for a selection of RGs, qPCR. In qPCR all the 4 candidates were stable, even though the RPS13 gene had been unstable in several RNAseq datasets, so there is some variation across techniques, though cohort size is an important factor. In order to make it simpler to examine variation across techniques we have modified supplementary Table 1. We have included the technique used in each dataset and the amount of variation for each dataset has also added. Figure 2 has also been modified for easier understanding. Overall, the average amount of RGs differentially expressed across all RNAseq datasets was 3.25, while the same number was 1.6 across MA datasets. However, MA datasets were also more often microdissected. See Table S1, Table 1, Figure 2 and line 427 – 445.

Some more specific questions

One presumes biopsies are done for a good reason I.e., the patient has an illness. So where do the non-disease control specimens come from. This needs very clear specification. It would be helpful if this was more clearly articulated.

An exceedingly important question. We have modified the paper for additional information. We can’t answer for external datasets, beyond what is available in the respective publications and their GEO pages, though they often use pre-transplantation biopsies. In our own data normal controls were selected from a group of biopsies graded by the renal-pathologist on duty as ‘’not containing any or only insignificant pathology’’. We then re-examined the biopsy, verified the non-pathological histology and accessed the patients clinical record for further information. Patients that later developed renal disease, severe autoimmune disease or showed severe proteinuria were excluded. The exact clinical reasoning for taking the biopsy in the first place, and underlying pathology, if any was found, varied substantially from patient to patient. See line 140 – 150.

There is an issue with taking tissue from formalin fixed tissue for mRNA extraction rather than rapidly placing tissue into a good RNA preservation solution such as RNA or snap freezing. While it I understood there a difficulties in collecting human tissue in this way it is still the best method. One realises a number of studies have done it from formalin fixed tissue to what extent might the various RNA species degrade differently and affect the findings?

Results for the same gene seem to vary very little between fresh-frozen and FFPE storage, Eikrem et.al. compared FFPE to fresh frozen samples from the same biopsy and the average expression and the log2 fold changes of these transcripts correlated with R2 = 0.97, and R2 = 0.96, respectively. (PLoS One. 2016 Feb 22;11(2):e0149743. doi: 10.1371/journal.pone.0149743. eCollection 2016.’’Transcriptome Sequencing (RNAseq) Enables Utilization of Formalin-Fixed, Paraffin-Embedded Biopsies with Clear Cell Renal Cell Carcinoma for Exploration of Disease Biology and Biomarker Development’’).

Similar results have been found in several other studies; see for example Esteve-Codina A, Arpi O, Martinez-García M, Pineda E, Mallo M, Gut M, et al. (2017) A Comparison of RNA-Seq Results from Paired Formalin-Fixed Paraffin-Embedded and Fresh-Frozen Glioblastoma Tissue Samples. PLoS ONE 12(1): e0170632.

Or Bossel Ben-Moshe, N., Gilad, S., Perry, G. et al. mRNA-seq whole transcriptome profiling of fresh frozen versus archived fixed tissues. BMC Genomics 19, 419 (2018). https://doi.org/10.1186/s12864-018-4761-3

Though it should be mentioned that not all species of RNA are as easily found in FFPE as in FF tissue, such as soluble RNA etc. However, RNAs that is present in FFPE tissue seem to behave similar to fresh frozen tissue.

The area size of the tissue blocks is mentioned but not region of kidney

tissue. This may or may not affect housekeeper genes but would highly

likely affect region of nephron structures an vasculature.

Another good question. We have amended the manuscript. The answer to this comment overlaps a bit with another comment on page 11 (‘’ Depending on the nature and extent …’’). We cannot provide additional details on the histological consistency of biopsies in external dataset beyond their peer reviewed status and the information provided in the original publication. In our own data biopsies were always taken for diagnostic purposes and therefore aimed at the kidney cortex. Biopsies with less than 50% cortex were discarded. Approximately 70% of the biopsies had 10 or more glomeruli. All biopsies contained arterioles to some degree while larger vessels were very rare.

Microdissected samples of course only contain their respective compartments.

See lines 144-147.

406 g is a relatively large amount of RNA from a tissue sample unless relatively large.

The yield was given as nanogram, not gram. The section has been amended for greater clarity, see lines 199 – 200.

Was this total RNA or poly mRNA.

Total RNA, see lines 201 – 202.

What was the relative purity of the RNA?

Thank you for the comment. 260/280 ratio had a median of 1,905 (range 1,67-1,98) and ratio 260/230 of 1,85 (1,01-2,11). We have added the data within the manuscript. See lines 200 – 204.

For qPCR how many plates were run and how was plate to plate variability

accounted for?

We run 3/4 technical replicates for each sample and each probe. We have added it to the manuscript. We agree that accounting for plate-to-plate variability is very interesting. Unfortunately, we had not enough material to replicate the analysis, and, as we were mainly interested in the differences between hypertensive nephropathy and controls, we decided to only perform technical replicates. See lines 209 – 210.

The Benjamini-Hochberg method is usually used for relatively large data sets such as those used in RNA seq and other large data sets. In analysing the qPCR data for the n=16 patients it is unclear why the various established methods of determining gene variance and hence

stability within and between groups was not used. This needs to be explained and justified.

Thank you for the suggestion. We have improved the explanation in the manuscript. We only used Benjamini- Hochberg method in the 30 RNA datasets that we used to assess the differential gene expression of the genes of interest. For qPCR, we analyzed the data using the Mann-Whitney U test. See line 231 – 232.

It is now generally accepted that some form of normalization of RNA seq data is required. This would be particularly true for many of the databases examined here. There is a lack of clearly indicated information on stating material. If all that is available is the broad diagnosis then should be indicated. There are varying assumptions that can be made regarding how normalization can be done and violation of the assumptions can lead to erroneous conclusions.As this study is all about defining genes with relatively constant expression for normalization these assumptions should be spelled out and it made clear how all samples fall within the those assumptions and what might be expected such as very high and low expressing genes. (Evans et

al Briefings in Bioinformatics, 19(5), 2018, 776-792) Aanes at al 2014 ) PLoS ONE 9(2): e89158.

Great Comment.

Regarding starting material, this is explained in greater detail in a different comment on page 12 (‘’ Whole tissue blocks…’’) and a comment on page 6 (‘’ The area size of the tissue blocks…’’). We have answered the concerns regarding starting material below those comments.

Regarding normalization, we have used CPM-normalization and a DESeq2 normalization for our internal datasets. For the external datasets, we have used the original authors own normalization, with the information available in the original publication and GEO submission. When no normalized data was available, we have used the same normalization strategies as for our own data. No data was compared between groups; all comparisons were kept within each dataset and within each compartment. So e.g., glomeruli from hypertensive patients from dataset 3 were only ever compared to glomeruli from other groups in dataset 3 e.g., Normotensive controls. We did not compare groups across datasets. So, while normalization between datasets may vary, it does not impact the results. The biological scaling normalization (BSN) referred to in the paper the reviewer linked was tested in embryonic zebrafish, where the amount of RNA was significantly different based on the embryonic development (‘’ The first 2.5 h are characterized by substantial increase of polyA+ RNA, while there is massive decay of RNA due to miRNA-430 activation at 3.5 hpf and onwards’’). However, in our case we are comparing biopsies from healthy human adults, so substantial differences due to ongoing development are not present. Additionally, the authors themselves point out that ‘’this illustrates the key difference between the normalization methods compared; BSN seek to maintain biological differences, while RPM and TMM lead to samples with similar distribution of the gene expression levels.’’ As such RGs that were differentially expressed using the standard normalization methods would most likely also be differentially expressed using the BSN normalization.

See lines 155-159.

In the description of the various RNA seq procedures there is discussion of analysis of differentially expressed gene's and genes with greater than 2 fold expression excluded yet the paper is about genes that are hopefully minimally differentially expressed. This seems more a generic

discussion of RNA seq rather than specific to this study.

We agree that other, lower, thresholds might have been chosen, which might have yielded a greater number of RGs with differential expression. However, we believe that a threshold of 2-fold expression is suitable as some, minimal, variation has to be condoned even in RGs, due to the lack a gene with a complete lack of variation and expected standard error of the methodology.

As I can see RNA seq is been used as useful way to determine gene expression of a number of genes at once across a variety of samples and should in principle provide a valuable set of relative gene expression data to determine within and between group variation and thus suitability as potential housekeepers. Such an analysis seems lacking.

We thank the reviewer for this comment and have made changes to the paper to answer them. Also please note that the answer to this comment overlaps to some extent with the answer to a previous comment on page 4 (‘’ How did the ranking and variability of the refence genes…’’)

In short; Table S1 displays the direction of changes for each RG for each dataset, and can thus provide information on inter-experiment variation, e.g. the direction of the foldchange for a specific RG across experiments, groups and methodology. See Table S1.

For the microarray data how was normalization done an what was relatively gene stability and what might have that don in assessing some variable mRNA species of interest.

All microarray data were obtained externally, from different groups. As such the exact method for normalization differed, depending on the original submitter. However, this information is available for each dataset through the GEO submission, which is referenced in each case. Dataset 9 and 10, for instance, were normalized as follows, according to information provided through the original GEO submission (GEO GSE104948).

‘’…Arrays were RMA normalized using probe sets common to U133A and U133 Plus2.0 and batch corrected using COMBAT (ERCB)…’’

We felt it unnecessary to reiterate the normalization process in cases where this information was already stated elsewhere, especially if it was part of an original submission.

We used adj. pvalues as proxies of gene stability, the same as for our own data. A gene with an adj. pvalue below 0.05 (for differential expression) was considered as unstable and thus unsuitable as an RG. See lines 235-236 and 287 – 292.

I think some tables of expression data from the RNA seq and latter qPCR with a clear ranking would be helpful.

The supplementary table S1 has foldchanges for every reference gene in every dataset. Additionally, we also display the adjusted pvalues for every reference gene in every dataset. Similar data for the qPCR results are displayed in Table 3.

The pure read counts are also available, for the external datasets as part of the GEO submission in question and our own data has been uploaded to GitHub, repository 310590-transciptomic-data. See lines 519 - 521, Table 3 and Table S1.

I think it would be helpful to determine what might be any experimental bias by clearly showing the effects of using some gene of interest that might change between conditions and how this can be affected by using a single or pair of particular housekeeper genes and how the results might

vary between RNA seq and qPCR. The authors should clearly show the effect of using various combinations of housekeeper genes.

Thank you for your suggestion. In the article, we do not try to suggest different sets of housekeeping genes, but rather verify established housekeeping genes based on our own and publicly available datasets. To mitigate condition bias, we performed qPCR on some of the housekeeping genes to confirm our findings from the different expression datasets.

We agree that an analysis of the effect of using various RGs on a number of theoretical genes of interest would be interesting. However, it is outside the scope of this paper. Our aim was only ever to validate proposed RGs using available data by examining their intragroup stability, not a separate analysis of the effects of various normalization strategies on qPCR results.

Lines 412 to 414in discussion

While it is true rt-PCR requires each gene to be measured individually

why would it require investigators to be more selective in diseases

studied. I would suggest that it is far more likely RT PCR would be used

than next gene RNA-seq based on cost and availability.

We agree that this point was poorly explained, and we thank the reviewer for pointing it out. We have amended the section. Our original point was this: for checking individual, or a low number of genes, qpcr is superior. The advantage of RNAseq here lies in sequencing every gene simultaneously, without focusing on any one gene in particular. Since data from additional renal diseases is often available online, researchers don’t have to constrain themselves to a few genes or a few diseases. See lines 424-425.

Depending on the nature and extent of disease type of tissue and region

of tissue it is hardly surprising there are differences in number of

genes expressed. In the kidney it is well described that there is

differential gene expression along the nephron and I would expect

vasculature. Another variable would be how tissue was dissected and time

taken for tissue fixation and differential stability of RNA species.

We thank the reviewer for this insightful comment.

The answer to this comment overlaps to some extent with a previous comment on page 6 (‘’ The area size of the tissue…’’). Regarding tissue region, we cannot provide additional details on the histological consistency of biopsies in external dataset beyond their peer reviewed status and information provided as part of the original population. In our own data, biopsies were initially taken for diagnostic purposes and therefore aimed at the kidney cortex. Biopsies with less than 50% cortex were discarded. Approximately 70% of the biopsies had 10 or more glomeruli. All biopsies contained arterioles to some degree while larger vessels were very rare. As such all biopsies were vascular to some degree, with minor variations.

Microdissected samples of course only contain their respective compartments, and microdissection was performed by personnel with long experience in renal histology. In cases where we were unsure of whether a structure was e.g. A proximal or distal tubule that section of the slide was excluded.

See lines 144-147.

Concerning sample processing; Again, we cannot answer for procedures performed during processing of external data, beyond their peer-reviewed status and the details that the original author has provided. For biopsies taken for our own data after obtaining the biopsy the tissue was immediately handed to a technician who fixated the biopsy as formalin-fixed paraffin embedded tissue, without delay.

All microdissections were performed on the same Zeiss PALM Lasor Capture Microdissection (LCM) system (Carl Zeiss AG, Oberkochen, Germany) with consistent personal and settings for each dataset. After microdissection the samples were immediately stored at -80 degrees till rna extraction, after which they were again immediately stored at -80 degrees.

See lines 147 – 150.

Whole tissue blocs would be expected to have less variability but again

will be subject to sampling and how a piece of tissue is cut. It is also

reasonable that the number of patients in each data set might be

important. With that in mind a full racial and demographic reporting

including drug and co existent morbidities needs to be reported as many

conditions might affect renal gene expression.

Also a good suggestion. We have greatly expanded the amount of information provided on the cohorts. See line 164 – 170.

The authors mention possible differences in methodologies of studying

reference or housekeeper genes. RNA-seq by its nature usually does not

use RG's but is dependent on sequencing and counting to normalize

samples. Again it would have been useful to see some analysis of

variation in the expression of suggested housekeepers by RNA-seq using

the various published approaches.

We thank the reviewer for this comment and have made changes to the paper to answer them. As the reviewer points out, this point has been brought up before, in a comment on page 4 (‘’ How did the ranking and variability of the refence genes…’’). We have answered in full there.

In short; Information on the variation in the expression of suggested housekeepers is displayed in Table S1; the foldchanges for every RG in every dataset (540 datapoints), analysis of variation for each dataset and RG as a whole are also displayed. See Table S1

The discussion could be more tightly focused. This is about RG's

The abstract mentions looking at RG's patients with hypertensive

nephropathy yet there is little clear data presentation on this subject

or what effects different refence genes might have on assessing

expression of some other gene of interest.

We have examined non-cancerous kidney biopsies, which includes biopsies from patients with hypertensive nephropathy. We have also looked at biopsies from patients with diabetic nephropathy, Fabry disease, focal segmental glomerulosclerosis, IgA nephropathy, membranous nephropathy, and minimal change disease, all of which is stated in the abstract. The patients for the qPCR experiment were also taken from patients with hypertensive nephropathy. See lines 30 – 38 and 181 – 190.

IT seems that some more comment might have bene made about using

different reference genes for different circumstances.

We have added some parts to the discussion regarding the use of different

reference genes for different circumstances. See lines 455- 464.

To conclude this letter, we would like to thank again all reviewers for their valuable comments. We hope to have answered all issues satisfactory.

Sincerely,

Philipp Strauss

University of Bergen

Department of Clinical Medicine

Haukeland University Hospital

Jonas Lies Vei 65,

Laboratory Building, 7th floor

5021 Bergen, Norway

Mobile phone: 0047 93686433

E-mail: Philipp.Strauss@uib.no

Attachment

Submitted filename: Response to Reviewers.doc

Decision Letter 1

Stephen D Ginsberg

27 Sep 2021

PONE-D-21-13929R1Variable expression of eighteen common housekeeping genes in human non-cancerous kidney biopsiesPLOS ONE

Dear Dr. Strauss,

Thank you for resubmitting your work to PLOS ONE. Please make the corrections posed by Reviewer #2 so I can render a decision on this manuscript.

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. 

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? 

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: I think the paper would benefit from an explicit example of what the effect would be on variation in relative gene expression using a a less stable vs more stable reference gene.

This could even be a theoretical discussion just looking at effect of variation in the denominator ie reference gene.

I think this would make the paper eve more impactful and drive the points home about housekeeper gene stability

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

==============================

Please submit your revised manuscript by December, 2021. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Stephen D. Ginsberg, Ph.D.

Section Editor

PLOS ONE

PLoS One. 2021 Dec 9;16(12):e0259373. doi: 10.1371/journal.pone.0259373.r004

Author response to Decision Letter 1


5 Oct 2021

Bergen, October 2021

Stephen D. Ginsberg, Ph.D.

Section Editor

PLOS ONE

Dear editor and reviewers

Thank you for taking the time to review our work a second time. We have taken your comments to heart and improved the manuscript accordingly. Below you will find a point-by-point review of the changes to the paper in response to each comment.

Reviewer #2:

1. Reviewer #2: I think the paper would benefit from an explicit example of

what the effect would be on variation in relative gene expression using a a

less stable vs more stable reference gene.

This could even be a theoretical discussion just looking at effect of

variation in the denominator ie reference gene.

I think this would make the paper eve more impactful and drive the points

home about housekeeper gene stability

Response to 1. Reviewer #2: We thank the reviewer for taking the time to review the manuscript again. We have added a section on the suggested topic to the discussion. See lines 401 – 411.

To conclude this letter, we would like to thank again all reviewers for their valuable comments and taking the time to review the manuscript again. We hope to have answered all issues satisfactory.

Sincerely,

Philipp Strauss

University of Bergen

Department of Clinical Medicine

Haukeland University Hospital

Jonas Lies Vei 65,

Laboratory Building, 7th floor

5021 Bergen, Norway

Mobile phone: 0047 93686433

E-mail: Philipp.Strauss@uib.no

Attachment

Submitted filename: Response to Reviewers.doc

Decision Letter 2

Stephen D Ginsberg

19 Oct 2021

Variable expression of eighteen common housekeeping genes in human non-cancerous kidney biopsies

PONE-D-21-13929R2

Dear Dr. Strauss,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Stephen D. Ginsberg, Ph.D.

Section Editor

PLOS ONE

Additional Editor Comments: Please address the minor errors pointed out by the Reviewer in the final submission.

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Strauss et al. identified the suitable RG from noncancerous renal tissues using the available data set and their specimen in this revised manuscript. They addressed the importance of the identification of relevant RG based on the methodology and tissue of interest. They identified YWHAZ as compared to ACTB, which historically was used as an RG. The authors have responded well to the comments. Although a few minor issues remain:

Abstract line 27: are widely use, “d” is missing

Line 36: 3 genes are listed and RPS13 is missing

Line 167, other renal, a word is missing. Is it other renal diseases? Or condition?

Line 339, I think it should be 20 datasets as in line 336 is stated 10/30 datasets, so remaining should be 20.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Acceptance letter

Stephen D Ginsberg

1 Dec 2021

PONE-D-21-13929R2

Variable expression of eighteen common housekeeping genes in human non-cancerous kidney biopsies

Dear Dr. Strauss:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Stephen D. Ginsberg

Section Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Full results with pvalues and foldchanges.

    MCD: Minimal change disease, MN: Membranous nephropathy, HT: Hypertension, DIA2: Diabetes type 2, FSGS; Focal segmental glomerulosclerosis, IGAN; IgA nephropathy, TCMR: t-cell mediated rejection, RPGN; Rapidly progressive glomerulonephritis, STA: stable allograft, ATI: acute tubular injury, IFTA: Interstitial fibrosis and tubular atrophy. In the columns under the gene IDs “Yes” refers to genes differentially expressed in control and test samples in the dataset. “No” refers to RG equally expressed in control and test samples in the defined dataset. Not available (NA) refers to RG not tested in specific datasets. Not detected (ND) refers to genes undetected in the specific dataset. Summaries and percentages are noted below each column.

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers.doc

    Attachment

    Submitted filename: Response to Reviewers.doc

    Data Availability Statement

    Data is available at GEO (https://www.ncbi.nlm.nih.gov/gds), accession numbers: GSE104948, GSE108113, GSE104954. Additional data is available at https://github.com/pst087/310590-transciptomic-data.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES