Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Aug 12;15:29592. doi: 10.1038/s41598-025-12992-7

Comparative analysis of library preparation approaches for FFPE gene expression profiling and related recommendations

Sara Pignatta 1, Marcella Tazzari 2,, Valentina Indio 3,, Maria Maddalena Tumedei 2, Francesco Limarzi 4, Francesca Tauceri 5, Michela Tebaldi 6, Dalila Fanelli 2, Jenny Bulgarelli 2
PMCID: PMC12343946  PMID: 40796588

Abstract

Next-Generation Sequencing (NGS) has transformed cancer research and clinical practice, with Whole Exome Sequencing (WES) driving advances in mutational profiling and personalized oncology. Yet, transcriptomic signatures remain essential for understanding disease mechanisms, including therapy resistance pathways. RNA sequencing (RNA-seq), however, faces unique challenges when dealing with low-input or degraded RNA, as often found in archival formalin-fixed paraffin-embedded (FFPE) tissues. Although previous studies have compared library preparation protocols, rapidly evolving technologies call for updated evaluations. Here, we present a direct comparison of two FFPE-compatible stranded RNA-seq library preparation kits: TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 (Kit A) and Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus (Kit B). Both kits generated high-quality RNA-seq data, yet important differences emerged. Notably, Kit A achieved comparable gene expression quantification to Kit B while requiring 20-fold less RNA input, a crucial advantage for limited samples, albeit with increased sequencing depth. We critically discuss these results in relation to RNA availability, technical performance, cost-effectiveness, processing time, and automation potential, offering practical guidance for selecting optimal RNA-seq strategies in clinical and translational research settings.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-12992-7.

Keywords: FFPE tissues, RNA-seq analysis, Gene expression, Library preparation protocols

Subject terms: Biological techniques, Cancer, Molecular biology

Introduction

High-throughput RNA sequencing (RNA-seq) is a widely used method for whole-transcriptome gene expression analysis, particularly in cancer cohort studies. The rapid development of diverse library preparation techniques and downstream RNA-seq analyses has enhanced its effectiveness and versatility in biological research1necessitating ongoing evaluations to stay up with constantly emerging technologies2. However, the selection of an appropriate RNA-seq library preparation protocol is influenced by two major factors: sample type and the availability of biological material. Among available specimens, formalin-fixed paraffin-embedded (FFPE) tissue remains one of the most accessible resources in both research and clinical settings due to its widespread use for preserving tissue morphology3. Nevertheless, FFPE-derived RNA is often fragmented, chemically modified, and degraded, making it suboptimal for gene expression profiling4. This degradation can compromise sequencing quality, impacting the reliability of gene expression analysis. Therefore, optimized strategies are essential to maximize the utility of FFPE samples and extract high-quality transcriptomic data from low-integrity RNA5.

A further challenge lies in the limited availability of FFPE tissue, which depends on tumor size, anatomical location, and clinical requirements for histopathological diagnosis. From a research perspective, FFPE sections are valuable, not only for transcriptomic studies, but also for immunohistochemistry, in situ hybridization, and DNA isolation. However, the choice of RNA-seq library preparation protocol can significantly influence gene detection and quantification, ultimately affecting differential expression analysis. A direct comparison of different library preparation strategies is therefore essential to determine the most suitable approach for FFPE-based transcriptomic studies.

In the present study, we described the pPathologist-assisted microdissection work-flow to extract nucleic acids from FFPE samples and comparatively evaluated the results and the performance of two recent and commercially available stranded RNA-seq library preparation protocols: the TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 (Kit A) and the Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus (Kit B). These kits were selected due to their distinct workflows and input requirements, particularly the TaKaRa™ kit, which utilizes a smaller amount of starting RNA material. This feature is particularly advantageous not only for small biopsies but also in cases where upfront Pathologist-assisted macrodissection is needed to precisely circumscribe the Region Of Interest (ROI), which often further reduces the amount of RNA that can be extracted. Our study aims to evaluate whether the TaKaRa™ kit can achieve performance comparable to the well-established Illumina kit, despite the lower quantity of RNA input. By presenting real-world data from this comparison, we offer researchers valuable insights to guide the selection of the most suitable RNA-seq method based on specific clinical and biological protocols, as well as available resources, addressing the critical recommendations for efficient library preparation and sequencing in settings with limited RNA input.

Results

Optimization of FFPE tissue macrodissection for nucleic acid extraction

RNA-seq is often performed on various FFPE tissue sources, making optimized macrodissection crucial to ensure high-quality data for downstream analyses. As an example, when analyzing the tumor microenvironment in melanoma, FFPE samples from lymph node metastases are frequently used6,7. In these cases, precise macrodissection, rather than bulk tissue scraping, is essential to exclude lymph node parenchyma and prevent the inadvertent inclusion of lymph node-associated tertiary lymphoid structures (TLS), rather than de novo tumor-driven TLS. Figure 1 illustrates our in-house Pathologist-assisted pipeline for nucleic acid extraction. Our selection strategy prioritizes high tumor content regions for DNA extraction and infiltrated tumor microenvironment regions for transcriptomic analysis. Indeed, some FFPE samples require two distinct blocks from the same surgical specimen,one for DNA and one for RNA extraction (Fig. 1a). In contrast, other cases allow for both RNA and DNA extraction from the same FFPE section (Fig. 1b).

Fig. 1.

Fig. 1

Pathologist-guided tissue selection for optimized RNA extraction from FFPE Samples. (a) H&E-stained slides from two different FFPE blocks, annotated separately for DNA extraction (yellow) and RNA extraction (blue). (b) H&E-stained slide containing regions of interest (ROIs) for both DNA (yellow) and RNA (blue) extraction. A 20x magnification inset illustrates the Pathologist’s selection criteria, ensuring RNA enrichment exclusively from tumor regions while avoiding adjacent normal tissue. A black freehand ROI marks a melanin-rich area, usually excluded from the macrodissection to prevent interference with nucleic acid yield and quality.

RNA quality and RNA sample assessment

RNA identically isolated from 6 tissue FFPE samples from a cohort of melanoma patients treated with Nivolumab were used to compare the performance of the TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 (Kit A) and Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus (Kit B). The average ng RNA obtained from a single FFPE 5 μm thick section is 127 ng/ul, with a range of 25 ng/ul to 374 ng/ul. The RNA quality parameters are described in Supplementary Table S1. In detail, the DV200 values range from 37 to 70%. No samples show DV200 < 30%, values that indicate too degraded samples according to DV200 evaluation criteria. This suggests that while the samples are fragmented, they are still usable for RNA-seq protocol.

Sequencing and read quality metrics

In order to evaluate the performance of RNA-seq methods for low quantity FFPE samples, we prepared TaKaRa and Illumina libraries for each RNA sample. Both workflows produced high quality libraries, whose metrics are detailed in Table 1.

Table 1.

Quality control (QC) metrics. Abbreviations:UTR, untranslated region; PF, passing-filter; rRNA, ribosomal RNA.

Step QC measure Kit A (min - max) Kit B (min - max )
Library preparation Average fragment size Mean (Range)

292 bp

(278–314 bp)

295 bp

(279–305 bp)

Concentration (nM) Mean (Range)

52.9 nM

(32.3–66.3 nM)

75.1 nM

(9.0–216.6 nM)

Sequencing

Total paired end reads

Mean (Range)

79.94 (71.01–88.71) 58.51 (51.07–65.98)
Alignment Reads uniquely mapped (%) 58.44 (47.77–65.28) 90.17 (84.46–93.04)
Reads mapped on multiple loci (%) 27.29 (22.44–34.39) 4.39 (3.31–6.3)
Unmapped reads (%) 14.28 (12.29–17.84) 5.44 (3.41–9.23)
Mismatch rate per base (%) 0.72 (0.68–0.78) 0.33 (0.29–0.39)
% Aligned reads rate 57.1 (46.2–63.8) 90.2 (84.6–93.2)
% coding 8.73 (6.3–9.8) 8.98 (7.7–9.9)
% UTR 9.78 (7.9–11.5) 12.52 (10.8–14)
% intronic 35.18 (27–40.9) 61.65 (58.5–66.9)
% intergenic 11.2 (10.4–13.2) 10.7 (9.4–13.8)
% rRNA 17.45 (14.7–24.1) 0.1 (0–0.2)
% PF not aligned 17.65 (15–21.6) 6.05 (4.8–8.1)
Percentage of duplicates 28.48 (13.2–64) 10.73 (7.3–23.0)
Gene expression quantification

n. of genes with more than

3 cmp

23,167

(20772–24913)

20,714

(17429–23026)

n. of genes with more than

30 cmp

13,841

(12567–14861)

13,146

(9954–15087)

The genetic features assignment rate and the alignment scores were calculated and shown in Fig. 2a and b respectively. The Phred quality scores shown in Fig. 2c shows a high quality score for both kits indicating a high confidence in the base call accuracy.

Fig. 2.

Fig. 2

Quality metrics of reads sequenced by Kit A and Kit B RNA-seq library preparation obtained with the MultiQC analysis. (a) Number of bases that map to coding, UTR, Intronic, Intergenic, Ribosomal regions of the human genome. (b) Number of reads mapped to one locus, multiple loci and unmapped. (c) Phred quality score plot across the read lengths for Kit A (green) and Kit B (yellow) samples.

At the library preparation stage, no difference was found in terms of average fragment size, however Kit B achieves greater concentrations than Kit A, suggesting better efficiency in library yield. Looking at the sequencing yield, we observed a higher total number of total paired-end reads compared to Kit B. Conversely, the Kit B showed better alignment performances, either in terms of the percentage of uniquely mapped reads, that was higher with respect to the Kit A, and percentage of reads mapped on multiple loci. In contrast, Kit A libraries exhibited a substantially higher proportion of ribosomal RNA (rRNA) content (17.45% vs. 0.1%) and a higher duplication rate (28.48% vs. 10.73%), suggesting less effective rRNA depletion and an increased presence of non-informative or redundant reads. Conversely, libraries prepared with Kit B showed a markedly greater proportion of reads mapping to intronic regions compared to Kit A (61.65% vs. 35.18%). Nevertheless, when focusing on coding regions, the quality metrics revealed comparable results between the two kits: the number of genes covered by at least 3 or 30 reads was nearly identical, and the percentage of reads mapping to exonic regions was similar (8.73% vs. 8.98%). Overall, these results indicate that the two library preparation kits delivered similar performances, with a balanced trade-off between their respective strengths and weaknesses.

Differential gene expression analysis

To explore the similarity of our samples, we performed an unsupervised Principal Component Analysis (PCA). The analysis indicated that the samples did not cluster in agreement with the library preparation methods (Fig. 3a). Conversely, a distinct separation between specimens was clearly evident suggesting that both kits produced the same gene expression profiles (Fig. 3b).

Fig. 3.

Fig. 3

Three-dimensional projections of principal component analysis (PCA). (a) Red and blue dots correspond respectively to kit A and kit B samples; (b) Each color corresponds to one specimen.

To better understand the concordance between the performance of these methods, we conducted a differential gene expression analysis by comparing three randomly selected samples against three others. We repeated the same analysis for the samples prepared using Kit A and Kit B, identifying significantly differentially expressed genes (DEGs) and estimating the degree of overlap. In the Kit A dataset, using a p-value threshold of < 0.05, we found 41 up-regulated and 111 down-regulated genes. Of these, 94 down-regulated and 33 up-regulated genes were also identified in the Kit B dataset, applying a less stringent p-value cut-off of < 0.2. This resulted in an estimated gene overlap of 83.6% (Fig. 4a, b).

Fig. 4.

Fig. 4

Diagram of differentially expressed genes. (a) Venn diagram showing the number of overlapping up- and down-regulated genes between the kit A and kit B. (b) Histogram of total number of DEGs in each group.

To further investigate the consistency between the two methods, we selected DEGs in Kit B dataset, by applying a p-value threshold of < 0.05, identifying 33 up-regulated and 37 down-regulated genes. When applying a less stringent p-value cut-off of < 0.2 to the Kit A dataset, 30 up-regulated and 36 down-regulated genes overlapped, resulting in a degree of concordance of 91.7% (Fig. 4a, b). The high degree of overlapping suggested that the expression pattern provided by the two kits was highly reproducible.

However, since the overlap estimation between DEGs could be imprecise for genes with low expression levels, we decided to examine the expression of housekeeping genes as an additional quality criterion, given their characteristic of maintaining consistently high expression across different conditions. A regression analysis of the expression levels (measured as logCPM) of 10 selected housekeeping genes in all the samples pairs (Supplementary Table S2) between Kit A and Kit B showed a highly significant correlation (R² = 0.9747, p-value < 0.001) (Fig. 5a). In addition, a larger list of highly variable genes (considering the top 500 genes with the larger IQR) was adopted to perform the hierarchical clustering analysis. The dendrogram obtained showed that the expression pattern is highly correlated with the same specimens despite the use of different methods demonstrating the inherent similarity among paired samples (Fig. 5b).

Fig. 5.

Fig. 5

Comparison of gene expression between methods. (a) Correlation for the data obtained with kit A and kit B within housekeeping genes. (b) Heatmap of hierarchical clustering of 500 genes with the higher interquartile range (IQR) of expression level between all samples.

We next asked if these DEGs, which may be due to technical issues unrelated to experimental design, would influence the interpretation of bioinformatic studies. To verify if the analysis with Kit A and Kit B led to a set of differentially expressed genes related to the same pathways, we used the enrichment analysis using the KEGG (Kyoto Encyclopedia of Genes and Genomes) database as reference. To estimate the overlapping we took into consideration the top 20 up- and down-regulated pathways. We found that 16/20 and 14/20 pathways were commonly enriched or depleted respectively (Fig. 6a, b). Despite the biological significance of the pathways being irrelevant due to the random selection of 3 vs. 3 sample comparison, these data suggested that the high degree of concordance observed in the single-gene analysis is also reflected at the pathway analysis level.

Fig. 6.

Fig. 6

Comparison of enriched (blue) and depleted (orange) KEGG pathways emerged in the analysis with Kit A (a) and Kit B (b).

Finally, to ensure a complete and comprehensive assessment of the two kits we focused on cost-benefit and time-consuming analysis. As shown in Table 2 the entire procedure duration is similar, however the required input of starting material is significantly lower, a prerequisite for higher efficiency, in Kit A, but also more expensive in comparison to Kit B.

Table 2.

Comparison of library Preparation kit and cost analysis. *low amount of extra reagent, **Cost/sample includes costs of plastic consumables and quantification kit.

KIT Total procedure time Liquid handling automation Input range Cost** per-sample accuracy
Kit A 7 h Yes* 10 pg – 10 ng ≈ 150 €
Kit B 7 h 45 min Yes 10 ng – 100 ng ≈ 65 €

Discussion

RNA-seq is one of the most powerful technologies for transcriptome profiling. In this study, we present the results of a comparative analysis of state-of-the-art commercial RNA-seq library preparation protocols for FFPE tissues. We focused on FFPE specimens because surgical tissues are routinely fixed and stored in FFPE blocks for diagnostic purposes and long-term preservation at room temperature. This represents an invaluable resource for retrospective biomedical research3,8. However, a major challenge in RNA-seq library preparation from FFPE samples is not only the typically low RNA quality9but also the limited availability of input RNA. This limitation is primarily driven by factors such as small tumor size, challenging anatomical locations, and the depletion of material during routine clinical diagnostic procedures. Additionally, the upfront selection of limited ROIs by Pathologists, although essential for improving the precision of area selection, including micro- and macrodissection, can further reduce the amount of tissue available for molecular analyses. This issue is especially critical in studies focused on therapy resistance, particularly within the immunotherapy field, where it is essential to minimize contamination from non-tumor-associated immune cells to ensure accurate and meaningful data interpretation6,7. We focused on stranded RNA-seq, which has been previously shown to provide a more accurate and reliable estimation of transcript expression compared to non-stranded RNA-seq10. Our comparative analysis offers critical insights into the performance of the two protocols. While both kits generate high-quality libraries, they differ in efficiency and sequencing performance. Notably, Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus demonstrates superior performance in applications requiring high-confidence mapping, making it particularly well suited for such studies. Conversely, when focusing on input material requirements, our results indicate that the TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 delivers optimal and efficient transcriptome profiling even from low RNA input amounts, making it particularly ideal for studies constrained by limited RNA availability. This observation aligns with previous reports highlighting the capability of SMARTer®-based technologies to perform reliably with minimal RNA input, including partially degraded samples11. Importantly, both protocols effectively preserve strand-of-origin information, supporting accurate gene expression quantification and transcriptional directionality, as also confirmed in prior separated evaluations of both kits by other groups1214. Notably, within this comparison, besides a robust overlapping of DEGs, the TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 consistently detected a greater number of DEGs compared to the Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus. Actually, both kits include similar steps of library preparation but use different technology to remove rRNA. The observed difference may reflect distinct rRNA depletion methodologies employed by the two kits, a critical factor affecting transcriptomic depth and complexity15,16. Indeed, the rRNA removal strategies of these two kits diverge significantly. The TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 performs rRNA depletion at the cDNA level, using specific probes to selectively eliminate ribosomal sequences, a method that has been shown to retain higher transcript diversity, particularly for fragmented RNA11. In contrast, the IIlumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus kit uses a pre-cDNA hybridization-based approach, where rRNA is captured by complementary oligonucleotides bound to magnetic beads and then precipitated. While this technique is generally effective under optimal conditions and, as shown in our data, results in a lower proportion of residual rRNA compared to the SMARTer® method (0.1% vs. 17.45%), this does not necessarily translate into improved transcriptome complexity. Post-cDNA rRNA depletion methods, such as those used in SMARTer® workflows, have been shown to offer advantages in terms of both transcript recovery and representation of low-abundance genes10,11. This likely accounts for the broader range of DEGs detected in our analysis and suggests that the SMARTer® approach may offer enhanced sensitivity, especially when working with challenging RNA sources. A different approach to RNA sequencing involves exome capture-based enrichment, as implemented in the Illumina TruSeq RNA Access kit. This method has demonstrated superior performance for low-quality and limited-yield RNA from FFPE samples compared to ribosomal depletion strategies17,18. However, it is primarily suited for protein-coding gene expression analysis rather than comprehensive transcriptome profiling. As concluded by Kai and colleagues18,if the study’s goal is to quantify non-coding RNAs or investigate precursors of the mRNA biology, this method is not the optimal choice. These prior studies add value to the present comparison, which provides novel and practical insights for laboratories handling precious and limited clinical samples. An additional, though non-peer-reviewed, source is a technical note19 from TaKaRa, which compares the performance of the SMARTer Stranded RNA-Seq Kit and Illumina’s TruSeq® RNA Sample Preparation Kit v2. While it concludes that the SMARTer Stranded RNA-Seq Kit generally outperforms across increasingly challenging RNA inputs, the comparison did not involve the TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2, designed for FFPE samples. Consequently, although different sample types and qualities were tested, RNA from FFPE clinical samples was not included in the evaluation. This study had several limitations, including a small sample size, a focus on melanoma tissues, the use of FFPE blocks stored for only one to two years, and the lack of assessment of samples with longer storage durations. However, based on these findings, we are now applying the TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 to a large cohort of immunotherapy-treated melanoma samples, which will help validate and further expand these data. In addition to technical performance, users should carefully consider both hands-on time and per-sample cost when selecting an RNA-seq protocol. While the manual library preparation time is comparable between the two kits analyzed, notable differences arise when evaluating automation compatibility. Although both kits are compatible with liquid handling robotic systems, the TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 offers limited extra reagent volumes. Since robotic workflows typically require additional reagent overages to ensure precise pipetting, this limitation thus leads to higher reagent consumption and increased costs per sample, an important factor to consider when scaling up for high-throughput studies. Moreover, although TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 requires significantly less starting material, our results indicate that achieving gene expression quantification comparable to Kit B demands a higher sequencing depth. While this necessity may result in elevated sequencing costs, Kit A remains a valuable option when RNA availability is a limiting factor, thus offering flexibility in experimental design. In conclusion, our comparative analysis demonstrates that both commercially available RNA-seq kits for FFPE-derived samples enable robust and reproducible differential gene expression analysis. However, important differences in required RNA input, rRNA depletion strategies, and associated experimental costs should be carefully weighed. We believe these insights provide essential guidance for researchers and clinicians aiming to select the most appropriate RNA-seq library preparation protocol based on specific study goals, available biological material, and budget constraints.

Methods

Sample collection and selection of regions of interest (ROI) for manual macrodissection

Written informed consent was obtained from all participants and specimens were collected from the pathology departments of reference. This study complied with all relevant ethical regulations. To ensure optimal nucleic acid extraction, a Pathologist (F.L.)-guided annotation was performed on the slides, avoiding bulk scraping of the entire tissue section. As illustrated in Fig. 1, regions selected for RNA-extraction (blue) were chosen to include areas of immune infiltration within the tumor microenvironment while excluding infiltrates unrelated to the tumor itself. This approach also enabled the exclusion of melanin-rich regions (Fig. 1b, black freehand ROI), which could interfere with nucleic acid yield and quality. Once annotated, the slide was aligned with two corresponding unstained Sect.sections (5 μm thickness), and the selected ROI was manually excised using a scalpel and transferred to a microcentrifuge tube for processing.

RNA isolation and assessment of quality

Total RNA was extracted starting from 6 FFPE tissue sections of 5 μm of thickness using the GenElute™ Total RNA Purification Kit (Sigma Aldrich, St. Louis, MO, USA) according to the manufacturer’s protocol. The RNA concentration was determined by Qubit RNA HS assay kit (Thermo Fisher Scientific, Cat#Q32855), then the quality parameter RNA integrity number (RIN) and DV200 (percentage of RNA fragments greater than 200 nt) were analyzed on 2100 Bioanalyzer Agilent using Eukaryote total RNA pico 6000 Assay. The same 6 FFPE human samples were used for both RNA-Seq library preparation kits.

RNA library preparation

RNA-seq libraries were prepared using the TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit v2 (Kit A) and Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus (Kit B) following the manufacturer’s instructions. The quality of the final library was checked determining the trace and the fragments length on 2100 Bioanalyzer with the DNA 1000 Kit and quantified using Qubit dsDNA BR Assay Kit. Libraries were sequenced in paired-ends (2 × 150 cycles) on Nextseq 550 platform (Illumina). In detail, we synthesize cDNA from 5 ng and from 100 ng of DNA-free mammalian total RNA for Kit A and Kit B, respectively. We performed the low input workflow for the TaKaRa™ SMARTer® Stranded kit, setting the option of 5 cycles for PCR1, and 16 cycles for PCR2. On the other hand, we adjusted the number of PCR cycles to 16 for the library amplification step in the Illumina Stranded Total RNA prep workflow.

Bioinformatic analysis

To estimate the reproducibility between the TaKaRa™ SMARTer® Stranded Total RNA-Seq Kit (Kit A) and Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus (Kit B), the raw data were processed using a customized RNA-seq pipeline previously described20. Briefly, the short reads were processed with the AdapterRemoval tool to clean them from sequencing adapters, and to trim the reads for sequence quality (minimum Phred quality of 10 and minimum length of trimmed sequence of 30). The sequences were mapped on the human reference genome hg38 with the algorithm STAR and the same tool pipeline was adopted to remove duplicated reads. Gene expression was quantified with the tool HTSeq-count and normalized as count per million (CPM) using the R package edgeR. The R package limma was adopted to perform the analysis of differentially expressed gene by dividing the 6 samples arbitrarily into two groups containing the same samples for both library preparation methods. In order to generate the quality parameters, the reads were aligned to the hg38 reference genome with STAR and Kallisto. The metrics were then processed using the picard CollectRNASeqMetrics and MultiQC tool. The principal component analysis (PCA) was implemented with the R function prcomp (package stats) and plotted with rgl package.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (65.3KB, pdf)

Author contributions

Conceptualization: S.P., M.T., J.B. Investigation: S.P., J.B. Resources: M.T., J.B., F.L., F.T. Computational analysis: V.I., Mi.Te. Data Curation: V.I., F.L., M.M.T., Mi. Te. Formal Analysis: V.I. Writing: S.P., M.T., V.I., J.B., D.F. Final Revision: All Authors.

Funding

M.T. acknowledges support from the “Associazione Italiana per la Ricerca sul Cancro” (grant no. MFAG 2021—ID. 26339 to M.T.). All of the other authors with IRCCS IRST affiliation acknowledge support from the Italian Ministry of Health and the contribution of “Ricerca Corrente” within the research line “Precision Medicine, Gender, Ethnicity, and Geroscience: Genetic-Molecular Mechanisms in the Development, Characterization, and Treatment of Tumors.”

Data availability

The RNA-seq data generated in this study have been deposited in the European Genome-phenome Archive (EGA) and are publicly available under accession number EGAS50000001066 at: https://ega-archive.org.

Declarations

Ethics approval and consent to participate

The protocol was approved by the institutional Medical Ethical Review Board (Protocol IRSTB134, cod. L3P2729, Approval date from the Romagna Ethics Committee (CEROM) 15/04/2022) and the study was conducted in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and later versions.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Marcella Tazzari, Email: marcella.tazzari@irst.emr.it.

Valentina Indio, Email: valentina.indio2@unibo.it.

References

  • 1.Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 171(17), 1–19 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Adiconis, X. et al. Comprehensive comparative analysis of RNA sequencing methods for degraded or low input samples. Nat. Methods. 10, 623 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Pennock, N. D. et al. RNA-seq from archival FFPE breast cancer samples: molecular pathway fidelity and novel discovery. BMC Med. Genomics. 12, 1–18 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Levin, Y. et al. Optimization for sequencing and analysis of degraded FFPE-RNA samples. J. Vis. Exp. (2020). 10.3791/61060 (2020). [DOI] [PMC free article] [PubMed]
  • 5.Cazzato, G. et al. Formalin-fixed and paraffin-embedded samples for next generation sequencing: problems and solutions. Genes (Basel). 12, 1472 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Cabrita, R. et al. Tertiary lymphoid structures improve immunotherapy and survival in melanoma. Nature577, 561–565 (2020). [DOI] [PubMed] [Google Scholar]
  • 7.Helmink, B. A. et al. B cells and tertiary lymphoid structures promote immunotherapy response. Nature577, 549–555 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Jacobsen, S. B., Tfelt-Hansen, J., Smerup, M. H., Andersen, J. D. & Morling, N. Comparison of whole transcriptome sequencing of fresh, frozen, and formalin-fixed, paraffin-embedded cardiac tissue. PLoS One. 18, e0283159 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Levin, Y. et al. Optimization for sequencing and analysis of degraded FFPE-RNA samples. J. Vis. Exp. 1–10 (2020) [DOI] [PMC free article] [PubMed]
  • 10.Zhao, S. et al. Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap. BMC Genom.16, 1–14 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lin, X. et al. A comparative analysis of RNA sequencing methods with ribosome RNA depletion for degraded and low-input total RNA from formalin-fixed and paraffin-embedded samples. BMC Genom.20, 831 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ura, H., Togi, S. & Niida, Y. A comparison of mRNA sequencing (RNA-Seq) library Preparation methods for transcriptome analysis. BMC Genom.23, 1–10 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Naphade, S., Bhatnagar, R., Hanson-Smith, V., Choi, I. & Zhang, A. Systematic comparative analysis of strand-specific RNA-seq library preparation methods for low input samples. Sci. Rep.. 121 (12), 1–10 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Palomares, M. A. et al. Systematic analysis of truseq, smarter and smarter Ultra-Low RNA-seq kits for standard, low and ultra-low quantity samples. Sci. Rep.. 91 (9), 1–12 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Newton, Y. et al. Large scale, robust, and accurate whole transcriptome profiling from clinical formalin-fixed paraffin-embedded samples. Sci. Rep. 2020. 101 (10), 1–11 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hrdlickova, R., Toloue, M. & Tian, B. RNA-Seq methods for transcriptome analysis. Wiley Interdiscip Rev. RNA8, (2017). [DOI] [PMC free article] [PubMed]
  • 17.Wang, D. et al. Comparison of two illumina whole transcriptome RNA sequencing library Preparation methods using human Cancer FFPE specimens. Technol. Cancer Res. Treat.21, 15330338221076304 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Song, K. et al. RNA-seq RNAaccess identified as the preferred method for gene expression analysis of low quality FFPE samples. PLoS One18, (2023). [DOI] [PMC free article] [PubMed]
  • 19.Comparing the performance. of RNA-seq kits for inputs of varying sample type and quality. https://www.takarabio.com/learning-centers/next-generation-sequencing/rna-seq/technotes/stranded-rna-seq-competitor-kit-comparison
  • 20.Indio, V. et al. Gene expression landscape of SDH-deficient gastrointestinal stromal tumors. J. Clin. Med.10, 1–18 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (65.3KB, pdf)

Data Availability Statement

The RNA-seq data generated in this study have been deposited in the European Genome-phenome Archive (EGA) and are publicly available under accession number EGAS50000001066 at: https://ega-archive.org.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES