Summary
Efficient skeletal muscle regeneration necessitates fine-tuned coordination among multiple cell types through an intricate network of intercellular communication. We present a protocol for generation of a time-resolved cellular interactome during tissue remodeling. We describe steps for isolating distinct cell populations from skeletal muscle of adult mice after acute damage and extracting RNA from purified cells prior to the generation of RNA sequencing data. We then detail procedures for generating and deciphering a time- and lineage-resolved model of intercellular crosstalk.
For complete details on the use and execution of this protocol, please refer to Groppa et al. (2023).1
Subject areas: Bioinformatics, Sequence analysis, Cell Biology, Cell isolation, Flow Cytometry/Mass Cytometry, Sequencing, RNAseq, Molecular Biology, Stem Cells, Computer sciences
Graphical abstract
Highlights
-
•
Multiple staining and gating strategies to purify cell subsets from skeletal muscle
-
•
An optimized protocol to extract sufficient and reliable RNA to generate RNAseq data
-
•
RNAseq dataset of five cell populations from a time course during muscle regeneration
-
•
An informatic analysis based on a high-resolution time course of RNAseq data
Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.
Efficient skeletal muscle regeneration necessitates fine-tuned coordination among multiple cell types through an intricate network of intercellular communication. We present a protocol for generation of a time-resolved cellular interactome during tissue remodeling. We describe steps for isolating distinct cell populations from skeletal muscle of adult mice after acute damage and extracting RNA from purified cells prior to the generation of RNA sequencing data. We then detail procedures for generating and deciphering a time- and lineage-resolved model of intercellular crosstalk.
Before you begin
Understanding the behavior of distinct cell types and their crosstalk is a challenging task in many in vivo models of disease. Recent advancements in single-cell “omics”, including RNA sequencing (RNA-seq), have offered ground-breaking insights at a resolution unattainable prior to the last decade. As opposed to conventional bulk RNA-seq, single-cell RNA-seq enables the potential to interrogate cellular diversity and profile cell-specific expression in complex biological systems. However, the cost of utilizing these innovative platforms remains a barrier to many research groups. The technology is also limited by the amount of information that can be captured per cell (i.e., low sequencing depth and coverage). To understand the signaling network underlying murine skeletal muscle regeneration, we have recently established a holistic approach to profile cells of distinct lineages, including fibro/adipogenic progenitors (FAPs), endothelial cells (ECs), muscle progenitors (MPs), inflammatory cells (ICs), and pericytes (PERs). This is performed through bulk RNA-seq of each population at steady state and multiple time points after acute Tibialis anterior (TA) muscle damage (1, 2, 3, 4, 5, 6, 7, 10, and 14 days), which is induced by intramuscular injection of notexin – a myotoxin. To ensure pure isolation of stromal cells, we have incorporated two reporter mouse lines B6.129S4-Pdgfratm11(EGFP)Sor/J (herein referred to as PDGFRa-eGFP) and Tg(Cspg4-DsRed.T1)1Akik/J (herein referred to as NG2-DsRed) to label FAPs and PERs, respectively. Additionally, these reporter lines were crossed to C-C chemokine receptor 2 knockout (CCR2-null) mice to analyze changes in muscle regeneration in the absence of immune cell infiltration from circulation.2 Following the generation of sequencing data, we assembled a multistep bioinformatics analysis workflow to dissect intercellular communication.
Institutional permissions
Animal colony maintenance and experimental procedures were conducted in accordance with the Animal Care Committee’s approval and regulations at the University of British Columbia.
Modeling of regeneration with notexin-induced muscle damage
Timing: 2 h
-
1.To study skeletal muscle regeneration, notexin, a myotoxin derived from snake venom, is a common agent used to induce acute muscle damage in mice.
-
a.Prepare notexin in sterile saline at 7.5 μg/mL.Note: Reconstituted notexin can be stored at −20°C for up to 3 months.
-
b.Load 20 μL of notexin in 31-gauge insulin syringes. Keep on ice until administration.
-
c.Anesthetize mice with 2.5%–3% of isoflurane in 1 L/min oxygen gas flow.
-
d.Confirm that the mice are fully anesthetized by pinching the toe, then shave hair off the hindlimb of mice and administer 20 μL of reconstituted notexin intramuscularly (0.15 μg) in theTibialis anterior (TA) muscle.
-
i.Insert needle just above the distal tendon of the TA muscle, in parallel to the tibia until the needle tip reaches the proximal end of the TA at the knee.
-
ii.Inject notexin while the needle is slowly retrieved toward the distal end of the TA.
-
iii.Apply pressure at the point of injection with a sterile gauze for 10 s to ensure no backflow of notexin.
CRITICAL: Notexin should be administered uniformly from proximal to distal end of the TA. Practice of intramuscular injection can be done with dyes like trypan blue or Evans blue to ensure uniform release of notexin.
Note: In order to reduce the number of animals required for obtaining enough cells for RNA sequencing, both TA muscles of the same mouse can be injected with notexin. During tissue digestion, TA muscles collected from mice injected at the same time point can be pooled.
-
i.
-
a.
Preparation of reagents and buffers
Timing: 2 h
Prior to starting the protocol, ensure that the following reagents are prepared and stored in sufficient volumes.
-
2.Enzymatic solutions for tissue digestion.
-
a.Prepare the collagenase type II solution in phosphate-buffered saline (PBS; pH 7.4).Note: Store PBS at room temperature for up to 6 months.
-
i.Dissolve collagenase type II from Clostridium histolyticum in PBS to yield a specific activity value of 2.5 U/mL.
-
ii.Pass the solution through a 0.2 μm membrane filter and store at −20°C for up to 6 months.
-
i.
-
b.Prepare the collagenase D/dispase II solution in PBS.
-
i.Dissolve collagenase D from Clostridium histolyticum and dispase II in PBS to yield a specific activity value of 1.5 U/mL and 2.4 U/mL, respectively.
-
ii.Pass the solution through a 0.2 μm membrane filter and store at −20°C for up to 3 months.
CRITICAL: Collagenase powders will float and form a layer at the top of PBS. Let the powders dissolve overnight in 4°C without shaking. Any residual precipitate formed on the bottom of PBS can be filtered prior to passing through the membrane filters.
Note: Avoid repeated freezing and thawing of either collagenase solutions as collagenase activity decreases post-reconstitution. Add CaCl2 to a final concentration of 2.5 mM, to activate the collagenases prior to tissue digestion. Store CaCl2 stock (250 mM) at −20°C for up to 1 year.
-
i.
-
a.
-
3.Buffers for cell purification.
-
a.Prepare fluorescence-activated cell sorting (FACS) buffer in PBS.
-
i.In 488 mL of PBS, add 2 mL of 0.5 M EDTA and 10 mL of fetal bovine serum (FBS).
-
ii.Pass the solution through a 0.2 μm membrane filter and store at 4°C for up to 2 weeks.
-
i.
-
b.Prepare the collection medium.
-
i.In 10 mL of Dulbecco’s modified Eagle medium (DMEM), add 2 mL of FBS, 0.1 mL of 200 mM L-glutamine, and 0.1 mL of penicillin-streptomycin solution. Collection medium should be made and used within 24 h.
-
i.
-
a.
Preparation for the bioinformatics analysis: Downloading software and packages
Timing: 15 min
-
4.
Download R and install BiocManager. In this section, we install CRAN and Bioconductor packages that are mentioned throughout the protocol. We strongly encourage the users to work with the latest version of the software packages (see troubleshooting 1). RStudio and RStudio Server IDE can provide valid support to organize and run the pipeline that still requires some steps when operating with Bash. To install R in your computer, follow the installation instructions described here: https://cran.r-project.org/. Our protocol was built using R version 4.2.
-
5.
Install packages required for the analysis. Download and install the packages using BiocManager. The full list of required packages is presented as follows.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
required_packages = c("RUVSeq", "edgeR", "igraph", "clusterProfiler","org.Mm.eg.db")
for (pkg in required_packages){
if (!require(pkg, character.only = T, quietly = T))
BiocManager::install(pkg)
}
# Install CellCB package from GitHub
BiocManager::install("cavei/cellCB")
Key resources table
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Chemicals, peptides, and recombinant proteins | ||
CaCl2 | Sigma-Aldrich | C4901 |
0.5 M EDTA | Invitrogen | 15575–038 |
FBS | Sigma-Aldrich | F1051 |
200 mM L-glutamine | Sigma-Aldrich | G7513 |
Penicillin-Streptomycin | Gibco | 15140–122 |
DMEM | Gibco | 11965–092 |
Collagenase D from Clostridium histolyticum | Roche | 11088882001 |
Dispase II | Roche | 04942078001 |
Collagenase type II from Clostridium histolyticum | Millipore Sigma | C6885 |
Notexin | Latoxan Laboratory | L8104-100UG |
NaCl | Fischer Scientific | S271-10 |
KCl | Fischer Scientific | P217-500 |
Na2HPO4 | Fischer Scientific | S374-3 |
KH2PO4 | Fischer Scientific | P285-500 |
RNAzol | Sigma-Aldrich | R4533 |
Superase | Invitrogen | AM2694 |
LPA | Sigma-Aldrich | 56575 |
Nuclease-free water | Thermo Fisher | 10977–015 |
Isopropanol | Sigma | 34863 |
Diethyl pyrocarbonate (DEPC) | Sigma | D5758 |
95% Ethanol | Commercial Alcohols | P025EA95 |
Ammonium-chloride-potassium (ACK) lysing buffer | Gibco | A10492-01 |
Propidium iodide | Invitrogen | P3566 |
Hoechst 33342 | Thermo Scientific | 62249 |
Antibodies | ||
APC anti-CD31 (1:400) | eBioscience | 17-0311-82 |
Biotinylated anti-VCAM1 (1:2000) | BD | 553331 |
Streptavidin-PE (1:500) | Invitrogen | SA1004-4 |
Alexa700 anti-CD45 (1:500) | BD | 560510 |
Alexa750 anti-ITGA7 (1:500) | ABLab (UBC) | Clone R2F2 |
FITC anti-CD31 (1:500) | eBioscience | 11-0311-85 |
APC anti-CD45 (1:500) | ABLab (UBC) | N/A (Clone I3/2) |
FITC anti-CD45 (1:500) | eBioscience | 11-0451-85 |
PECy7 anti-SCA1 (1:4000) | eBioscience | 25-5981-82 |
APC anti-CD146 (1:500) | BioLegend | 134712 |
PE anti-ITGA7 (1:2000) | ABLab (UBC) | N/A (Clone R2F2) |
eFluor660 anti-VCAM1 (1:1000) | Thermo Fisher | 50-1061-82 (Clone 429) |
Experimental models: Organisms/strainsa | ||
B6.129S4-Pdgfratm11(EGFP)Sor/J | The Jackson Laboratory | 007669 |
Tg(Cspg4-DsRed.T1)1Akik/J | The Jackson Laboratory | 008241 |
B6.129S4-Ccr2tm1Ifc/J | The Jackson Laboratory | 004999 |
Software and algorithms | ||
R | R core team, 20223 | www.r-project.org |
RStudio | R Studio team, 2023 | https://posit.co/download/rstudio-desktop/ |
BiocManager | Morgan et al., 20234 | https://cran.r-project.org/web/packages/BiocManager/ |
RUVSeq | Risso et al.5 | https://bioconductor.org/packages/release/bioc/html/RUVSeq.html |
edgeR | Robinson et al., 20106 | https://bioconductor.org/packages/release/bioc/html/edgeR.html |
Igraph | Csárdi et al., 20237 | https://cran.r-project.org/web/packages/igraph/index.html |
clusterProfiler | Wu et al., 20218 | https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html |
org.Mm.e.g.,.db | Carlson 20199 | https://bioconductor.org/packages/release/data/annotation/html/org.Mm.eg.db.html |
cellCB | Martini10 | https://github.com/cavei/cellCB |
Original codes | This paper and Groppa et al.1 | https://doi.org/10.5281/zenodo.8127429 |
Deposited data | ||
RNAseq data | Groppa et al.1 | GEO: GSE210748 |
Other | ||
0.2 μm membrane filter | Pall Corporation | 4187 |
31-gauge insulin syringes | BD | 320440 |
15 mL centrifuge tube | Falcon | 352096 |
50 mL centrifuge tube | Falcon | 352070 |
40 μm cell strainer | Falcon | 352340 |
70 μm cell strainer | Falcon | 352350 |
5 mL polypropylene tubes | Falcon | 352235 |
5 mL polystyrene tubes | Falcon | 352063 |
60 mm petri dish | Falcon | 353002 |
RNase-free microtubes | Axygen | MCT-150-L-C |
Low adhesion pipet tips (1000 μL) | Labcon | 1177-965-008-9 |
Low adhesion pipet tips (200 μL) | Labcon | 1179-965-008-9 |
Low adhesion pipet tips (10 μL) | Labcon | 1171-965-008-9 |
a All animal experiments must adhere to approved protocols, regulations, and ethics set out by local governing bodies. To minimize confounding effects due to age and sex, controlling for these biological variables is highly recommended. In our study,1 we had limited the age of experimental mice to adulthood (2–4 months).
Materials and equipment
250 mM CaCl2
Reagent | Final concentration | Amount |
---|---|---|
CaCl2 | 250 mM | 2.7745 g |
Nuclease-free water | N/A | 100 mL |
Total | N/A | 100 mL |
1× PBS (pH 7.4)
Reagent | Final concentration | Amount |
---|---|---|
NaCl | 0.137 M | 8 g |
KCl | 2.7 mM | 0.2 g |
Na2HPO4 | 8.1 mM | 1.15 g |
KH2PO4 | 0.5 mM | 0.2 g |
Distilled water | N/A | 1 L |
Total | N/A | 1 L |
DEPC PBS
Reagent | Final concentration | Amount |
---|---|---|
DEPC | N/A | 1 mL |
1× PBS | N/A | 999 mL |
Total | N/A | 1 L |
70% Ethanol
Reagent | Final concentration | Amount |
---|---|---|
95% Ethanol | N/A | 77.78 mL |
Nuclease-free water | N/A | 22.22 mL |
Total | N/A | 100 mL |
RNA storage buffer
Reagent | Final concentration | Amount |
---|---|---|
Superase | N/A | 1 μL |
Nuclease-free water | N/A | 19 μL |
Total | N/A | 20 μL |
Step-by-step method details
Tissue digestion of skeletal muscle
Timing: 2 h
In this step, skeletal muscle, specifically the TA, is enzymatically dissociated to yield a cell suspension for FACS.
-
1.Collect TA and prepare for enzymatic digestion.
-
a.Euthanize mice and collect their TAs in cold PBS in a petri dish placed on ice. Remove the fascia on the surface of the TA.Note: The TA tendon is the most lateral and anterior of the three distinctly visible tendons above the ankle. A small gauge syringe needle can be used to separate the TA tendon from the others and slide scissors underneath for an incision. Gently lifting the TA after having removed the fascia and only cutting the TA tendon allows for the muscle to be easily lifted from the leg.
-
b.Gently cut the muscle into 2 mm pieces with surgical forceps in the petri dish placed on ice.
-
a.
-
2.Digest the tissue with the collagenase type II mix.
-
a.Activate the collagenase type II solution by adding 250 mM CaCl2 (see materials and equipment). For every 1 mL of enzymatic solution, add 10 μL of aqueous CaCl2 and resuspend.
-
b.Add 200 μL of the mix to each TA and transfer the sample to a 15 mL centrifuge tube. For each time point we pooled an X number of TAs as explained at the end of this section, therefore we prepared 200 μL of solution × the number of TAs.
-
c.Incubate the tissue lysate at 37°C for 30 min while gently rotating using a tube rotator at 20 RPM.
-
d.To quench the enzymatic reaction, top up the sample to 15 mL with pre-chilled PBS and vortex.
-
e.Centrifuge the sample at 360 g for 5 min at 4°C and decant the supernatant.
-
f.Repeat d.-e. once.
-
a.
-
3.Digest the tissue with the collagenase D/dispase II mix.
-
a.Activate the collagenase D/dispase II solution by adding 250 mM CaCl2. For every 1 mL of enzyme solution, add 10 μL of aqueous CaCl2 and resuspend.
-
b.Add 1 mL of the mix to each TA and mix well, as done in the point 2b.
-
c.Incubate the tissue lysate at 37°C for 1 h while gently rotating using a tube rotator at 20 RPM. Gently vortex the samples every 15 min.
-
d.To quench the enzymatic reaction, top up the sample to 15 mL with pre-chilled PBS and mix by inverting the tube 2–3 times.
-
a.
-
4.Filter and wash the tissue lysate.
-
a.Pass the lysate through a 70 μm cell strainer into a 50 mL centrifuge tube.
-
b.Repeat a. with a 40 μm cell strainer and top up the filtrate to 40 mL with cold PBS.
-
a.
-
5.Lyse red blood cells (RBC) in the tissue lysate with ammonium-chloride-potassium (ACK) lysis buffer.
-
a.Centrifuge the sample at 360 g for 5 min at 4°C.
-
b.Decant the supernatant and lyse RBCs with 1 mL of ACK at 4°C for 1 min, followed by adding 40 mL of FACS buffer to quench the reaction.
-
a.
-
6.
Centrifuge the sample at 360 g for 5 min at 4°C, remove the supernatant, and resuspend in the antibody mix as explained in the next section.
CRITICAL: According to the time point of injury, TAs were pooled from multiple animals to purify enough cells for sequencing (from 50K to 500K cells/sample). In our study,1 we had collected 16 TAs to purify cells from steady state and 14 days after injury, thereby the final volume of enzymatic buffer used was 3.2 mL. However, 6–10 TAs were sufficient for all other time points, thus we applied a volumes of enzymatic buffer of 1.2–2 mL. It is important to collect and cut the tissue within 5 minutes after euthanization to obtain good cell viability during cell purification.
Purification of distinct cell types from skeletal muscle
Timing: 4 h
In this step, the cell suspensions obtained from skeletal muscle digestions were stained with two panels of antibodies, one for the PDGFRa-eGFP and the other for the NG2-DsRed mouse reporter system. Stained samples were passed through a FACS machine where populations of interest were identified and purified.
-
7.Sample staining with antibody cocktails.
-
a.Incubate the cell suspension at 4°C for 25–30 min in antibody mixture protected from light.
-
i.For each TA, add 200 μL of antibody mixture. Thus, as done in the previous section with the enzymatic buffer for the tissue digestion, calculate the antibody mixture as 200 μL × the number of TAs collected for the time point of interest.
-
ii.For samples derived from PDGFRa-eGFP mice, first stain the cells with biotinylated anti-VCAM1, APC anti-CD31, Alexa700 anti-CD45, and Alexa750 anti-ITGA7, followed by another round of staining with streptavidin-PE.
-
iii.For samples derived from NG2-DsRed mice, stain the cells with FITC anti-CD31, APC anti-CD45, and PE-Cy7 anti-SCA1.
-
i.
-
b.After the incubation, wash samples twice with an equal volume of FACS buffer as the antibody mix, spinning at 350 g for 5 min at 4°C and decanting the supernatant after each wash. Repeat the incubation steps from a. if staining with a secondary antibody mixture is required.
-
c.Reconstitute cells with 500 μL of FACS buffer containing Hoechst 33342 (4 μM).
-
a.
-
8.Cell sorting with a FACS machine.
-
a.Filter the cell suspension through a 35 μm cell strainer into a 5-mL round-bottom FACS tube.
-
b.Insert the sample in a FACS machine and start acquiring events.
-
c.Gate cells based on their forward and side scatter and identify live cells as Hoechstmid events. We suggest to perform a single sorting with purity mode. You can check the purity of the sorted samples by taking a small fraction of the sorted cells and running it through the FACS machine.Note: There are typically three distributions of cells based on Hoechst signal. Hoechst- events are debris, Hoechstmid events are live singlets, and Hoechsthigh events are multiplets and dead cells (see troubleshooting 2). Additional singlet gating with plotting the height (FSC-H) to area (FSC-A) measures of the forward scatter can help to eliminate multiplets as single events fall along the linear diagonal between these two parameters and anything deviating from the diagonal can be presumed as a multiplet and left outside of the singlet gate.
-
d.Set the gating strategy as follows:
-
i.For samples derived from PDGFRa-eGFP mice, sort inflammatory cells (CD45-Alexa700+/CD31-APC-), endothelial cells (CD45-Alexa700-/CD31-APC+), FAPs (CD45-Alexa700-/CD31-APC-/EGFP+), and myogenic progenitors (CD45-Alexa700-/CD31-APC-/EGFP-/ITGA7-Alexa750+/VCAM1-PE+).
-
ii.For samples derived from NG2-DsRed mice, sort inflammatory cells (CD45-APC+/CD31-FITC-), endothelial cells (CD45-APC-/CD31-FITC+/SCA1-PeCy7+),FAPs(CD45-APC-/CD31-FITC-/SCA1-PeCy7+), and pericytes (CD45-APC-/CD31-FITC-/SCA1-PeCy7-/DsRed+).
-
i.
-
e.Collect cells in a polypropylene or polystyrene 5-mL round-bottom FACS tube containing 1 mL collection media (see troubleshooting 3).Note: The cell gating strategy for both PDGFRa-eGFP and NG2-DsRed systems is reported in Figures 1A and 1B, respectively. In the absence of PDGFRa and NG2 reporters, which are used to identify FAPs and pericytes respectively, other staining panels can be applied using wild type mice. FAPs can be purified by applying the staining strategy as we previously described,11 where FAPs are gated as CD45-/CD31-/CD34+/SCA-1+. Another strategy consists in pooling CD31 and CD45 staining in the same channel (Lin FITC) and, together with SCA1 marker, identifying inflammatory cells as Lin-FITC+/SCA1-PECy7- cells, endothelial cells as Lin-FITC+/SCA1-PECy7+ cells, and FAPs as Lin-FITC-/SCA1-PECy7+ cells (Figure 2A). In this panel, the antibodies anti-ITGA7 and anti-VCAM1 are conjugated with different fluorophores (PE and eFluor660, respectively) to improve the separation of muscle progenitors (Lin-FITC-/SCA1-PECy7-/ITGA7-PE+/VCAM1-eFluor660+) from the other cells compared to Figure 1A. Pericytes can be isolated by targeting CD146 as demonstrated by Crisan and colleagues,12 together with CD45, CD31 and SCA1. After negative selection using lineage markers (CD45-FITC and CD31-FITC), pericytes are isolated as SCA1-PECy7-/CD146-APC+ cells (Figure 2B).Note: To apply a stringent selection of the viable cells, you can reconstitute cells with 500 μL of FACS buffer containing Hoechst 33342 (4 μM) together with propidium iodide (PI; 1 mg/mL) and gate Hoechstmid/PI- cells for your analysis (Figure 2A).
-
a.
Figure 1.
Cell isolation from skeletal muscle using reporter system
(A) Gating strategy to purify inflammatory cells (CD45+/CD31-), endothelial cells (CD45-/ CD31+), FAPs (CD45-/CD31-/eGFP+) and skeletal muscle progenitors (CD45-/CD31-/eGFP-/ITGA7+/VCAM1+) using PDGFRa-eGFP reporter mouse system.
(B) Gating strategy to purify inflammatory cells (CD45+/CD31-), endothelial cells (CD45-/CD31+/SCA1+), FAPs (CD45-/CD31-/SCA1+) and pericytes (CD45-/CD31-/SCA1-/DsRed+) using NG2- DsRed reporter mouse system.
Figure 2.
Cell isolation from skeletal muscle without using any reporter system
(A) Second staining and gating strategy where CD31 and CD45 are pooled in the same channel (Lin FITC) and inflammatory cells are identified as Lin+/SCA1- cells, endothelial cells as Lin+/SCA1+ cells, FAPs as Lin-/SCA1+ cells and skeletal muscle progenitors as Lin-/Sca1-/ITGA7+/VCAM1+ cells.
(B) Alternative staining strategy to purify pericytes as CD45-/CD31-/SCA1-/CD146+ cells.
RNA extraction from cell types derived from skeletal muscle
Timing: 2 days
In this step, RNA of sorted cells and total tissue (specifically TA) are extracted prior to library preparation.
Note: Use sterile low adhesion pipet tips during RNA extraction.
-
9.Preprocess cell suspension or tissue for RNA isolation.
-
a.For cell suspension (< 4E+6 cells):
-
i.Top up the collection tube containing cells with DEPC PBS (see materials and equipment) to 4 mL.
-
ii.Pellet the cells by centrifugation at 800 g for 10 min at 4°C.
-
iii.Aspirate the supernatant and resuspend the pellet in 1 mL of DEPC PBS.
-
iv.Transfer the cell suspension to an RNase-free microtube.
-
v.Repeat ii, and aspirate the supernatant.
-
vi.Add 400 μL of RNAzol and resuspend the cell pellet.
-
i.
-
b.For total tissue (per TA):
-
i.Immediately after harvesting, clean the tissue in cold PBS, dry it quickly with paper towel, and directly submerge it in 500 μL of RNAzol in a 5 mL polystyrene tube.
-
ii.Lyse the tissue with a homogenizer until complete disruption of the tissue on ice.
-
i.
-
a.
-
10.Purify RNA with RNAzol.
-
a.Add 200 μL (cells) or 250 μL (tissue) of nuclease-free water and resuspend the mixture.
-
b.Store the lysate at room temperature for 15 min.
-
c.Spin down the lysate at 12,000 g for 15 min at 4°C.
-
d.Transfer the supernatant into a new RNase-free microtube without disturbing the pellet.
-
a.
CRITICAL: Sample should be cloudy blue upon the addition of nuclease-free water. If it is clear, add more nuclease-free water and resuspend.
Note: Water induces the precipitation of non-RNA contents whereas RNA remains soluble in the supernatant.
-
11.Precipitate RNA with isopropanol.
-
a.Add 1 μL of LPA to the sample and resuspend.
-
b.Add an equal volume of pre-chilled isopropanol to the sample (∼500 μL).
-
c.Resuspend the mixture well and incubate at −20°C overnight.
-
d.Spin down the RNA at 20,000 g for 30 min at 4°C.
-
a.
-
12.Wash the precipitated RNA with 70% ethanol.
-
a.Remove the supernatant and add 1 mL of freshly made, pre-chilled 70% ethanol (see materials and equipment).
-
b.Gently detach the pellet from the tube and very carefully wash it by pipetting the solution slowly.
-
c.Spin down the pellet at 20,000 g for 5 min at 4°C.
-
d.Repeat a.-c.
-
a.
-
13.Reconstitute RNA in nuclease-free water.
-
a.Remove all supernatant from the sample tubes.
-
b.Air dry the pellet for at least 5 min at room temperature.
-
c.Resuspend the RNA pellet with 20 μL (cells) or 50 μL (tissue) of freshly made RNA storage buffer (see materials and equipment).
-
d.Store the reconstituted RNA at −80°C until use.
-
a.
CRITICAL: Do not disturb the pellet when removing the supernatant. Start with a P200 pipet and switch over to a P10 pipet when most of the supernatant has been removed.
Note: Keep track of the location of RNA pellet in the tube with a marker as it will become transparent after drying.
Library preparation and RNA sequencing
In this step, we briefly outline key points and recommendations for library preparation and RNA sequencing as these services are generally provided by core facilities.
-
14.
It is critical for RNA integrity to be assessed prior to library generation. In our manuscript,1 we had used the Agilent 2100 Bioanalyzer to estimate the RNA Integrity Number (RIN) and discarded samples with less than a score of 7.
-
15.
Library generation should be performed using prep kits that are compatible with sequencers. In our manuscript, we had used TruSeq Stranded mRNA Library Prep kit (Illumina) or NEBNext Ultra II Directional RNA Library Prep kit (New England Biolabs).
-
16.
The recommended sequencing depth is at least 15 million reads per sample. In our manuscript,1 sequencing was performed on the NextSeq 500 sequencer (Illumina) with pair-end reads (43bp × 43bp) using the High Output Reagent kit (75 cycles; Illumina).
-
17.Data processing upstream of gene expression analysis is a multi-step workflow involving demultiplexing, trimming, filtering, and alignment of reads to a reference genome. In our manuscript,1 processing of raw reads was performed in accordance with Illumina’s recommendations using the RNA-seq Alignment suite.
-
a.Base call files were de-multiplexed by bcl2fastq2. Adapter sequences were trimmed and low-quality reads (< 35 base pairs) were discarded.
-
b.Demultiplexed read sequences were then aligned to the mm10 genome reference using STAR aligner.
-
c.The number of aligned reads to each annotated gene was tallied with RnaReadCounter to generate read-count matrices for all samples, which were used as inputs for downstream analyses.
-
a.
Bioinformatics pipeline
The Bioinformatic approach is summarized in Figure 3. To understand the multicellular interplay that occurs during complex processes like regeneration, the user will explore the gene expression of each cell type, followed by an integrated analysis among all the cell types and the total tissue. The goal is to identify three types of genes: i) “pop-actived” genes that are expressed by a population at a specific time point during the tissue regeneration; ii) “subset-specific” genes that are associated to the expansion of a specific cell type as they are upregulated by one population and change their pattern in the total muscle over time; and iii) “constitutively-active” genes that are associated to a cell type as they are upregulated by one cell type, yet they do not change their pattern in the total muscle over time.
Figure 3.
Bioinformatics pipeline
Figure reprinted from our previous publication Groppa et al., Cell Reports 2023. Schematic overview of the process used to identify genes modulated (DEG) during the response to damage.
(A) Cell type specific analysis to identify DEGs for each cell type during the muscle regeneration.
(B) Total muscle analysis to identify DEGs in the total muscle during the muscle regeneration.
(C) Comparison across cell types to identify genes that do not change over the time, yet they are enriched in one specific cell type.
(D) The combination of the outcomes from A, B and C leads to three categories of genes: pop-actived genes that are dynamically expressed over the time by a specific cell type, subset-specific genes that are associated to the expansion of a specific cell subset, and constitutively-active genes that are enriched in one specific cell type without changing expression during the regeneration process.
Download and prepare the data matrix for the analysis
Timing: 10–20 min
-
18.
For this analysis, we will use the data published in the work of Groppa et al. 2023.1 Before starting, create the data matrices with raw counts. Visit Gene Expression Omnibus (GEO) at NCBI, search for GSE210748 series and download the supplementary files from this link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE210748. The files contain matrices with raw count in xlsx format. After removing empty columns, convert each matrix in text delimited file (File -> Save as -> text file).
Note: To simplify the data processing, we prepared the text files in an archive in GitHub (in github.com/Martini-CompBio-DMMT/muscle-regeneration-2023, the user can find a tar archive called raw-dataset.tar.gz). Download the archive and unarchive it in a directory called “expression-matrices”. An example of Bash and R code used to create the directory and download the archive is shown below.
# Bash code
mkdir -p expression-matrices
cd expression-matrices
wget -c "https://github.com/Martini-CompBio-DMMT/muscle-regeneration-2023/raw/main/raw-dataset.tar.gz"
tar xf raw-dataset.tar.gz
cd ..
# R code
dir.create("expression-matrices", showWarnings = F)
download.file("https://github.com/Martini-CompBio-DMMT/muscle-regeneration-2023/raw/main/raw-dataset.tar.gz",
"expression-matrices/raw-dataset.tar.gz")
untar("expression-matrices/raw-dataset.tar.gz",
exdir = "expression-matrices")
-
19.
This generates count matrix files in the “expression-matrices” directory. Read and extract cell types and condition from the file names. For simplicity, name the samples with a short format (for example, endothelial cells from wild type = “ec_wt”).
# Extract sample names and rename
count_files <- dir("expression-matrices/",
pattern = "GSE210748_Raw_counts_.+_WT.txt$")
samples <- gsub(".txt$", "", count_files)
samples <- gsub("GSE210748_Raw_counts_", "", samples)
names(samples) <- c("ec_wt", "fap_wt", "inf_wt",
"mp_wt", "per_wt", "tot_wt")
-
20.
Read in the count matrix files in a list, checking if the names of the genes are identical across each data file. This step is crucial to avoid merging matrices with different gene order or content.
# Read count matrix and check genes’ order
count_cells <- lapply(paste0("expression-matrices/", count_files),
read.table, header=T, sep="∖t", as.is=T, row.names=1, check.names=F)
rn_genes <- row.names(count_cells[[1]])
checks <- sapply(count_cells, function(x) {
identical(rn_genes, row.names(x))
})
if (!all(checks))
stop("Row names differs")
Note: If no errors pop up, it means that the row names are matched across different expression-matrices. In this protocol, we focus only on the analysis of samples collected from wild type (WT) mice. Nevertheless, all functions are designed to be applicable for all gene-count matrices, including samples derived from CCR2 KO mouse line used in the work of Groppa et al. 2023.1
-
21.
To improve visualization, reformat sample names with an annotation that contains the features of each sample. For example, the sample named “ec_wt_0d-1” corresponds to the first biological replicate of endothelial cells derived from the WT condition at steady state.
# Reformat samples’ names in count matrices
names(count_cells) <- names(samples)
samples <- samples[c("ec_wt", "fap_wt", "inf_wt", "mp_wt", "per_wt", "tot_wt")]
count_cells <- count_cells[names(samples)]
for (celltype in names(count_cells)) {
colnames(count_cells[[celltype]]) <- paste(celltype, colnames(count_cells[[celltype]]), sep="_")
}
Note: As described before, the dataset analyzed in the work of Groppa et al. 20231 includes the sequencing data of total muscles collected at the different time points of regeneration after injury. The inclusion of this dataset allows to integrate the gene expression profiles of individual cells and total tissue as described later.
-
22.
Import this data and name it as “tot”.
tot <- count_cells$tot_wt
-
23.
Collapse the count matrix files of each cell type, except the total muscle, in a unique matrix file. The information related cell type (cell), time point during the regeneration (time), and number of replicates (rep) are saved in the column-names for each sample. Extract this information and build an annotation data frame. Alternatively, manually create a spreadsheet file (see Table 1).
# Create sample annotation data frame for each cell type
type_selection <- c("ec_wt", "fap_wt", "inf_wt", "mp_wt", "per_wt")
countData <- do.call(cbind, count_cells[type_selection])
colnames(countData) <- sapply(strsplit(colnames(countData), ".", fixed = T), "[[", 2)
colData.1 <- t(sapply(strsplit(colnames(countData), "-"), function(x) {
if (length(x) == 1){
c(x, "1")
} else {x}
}))
colData.2 <- t(sapply(strsplit(colData.1[,1], "_"), function(x) {
if (length(x) == 1){
c(x, "1")
} else {x}
}))
colData <- data.frame(row.names=colnames(countData),
cell=colData.2[,1],
condition=colData.2[,2],
time=as.numeric(gsub("d", "", colData.2[,3])),
days=colData.2[,3],
rep=colData.1[,2])
-
24.
Apply the same with total muscle data.
# Create sample annotation data frame for total tissue
tcolData.1 <- t(sapply(strsplit(colnames(tot), "-"), function(x) {
if (length(x) == 1){
c(x, "1")
} else {x}
}))
tcolData.2 <- t(sapply(strsplit(tcolData.1[,1], "_"), function(x) {
if (length(x) == 1){
c(x, "1")
} else {x}
}))
tcolData <- data.frame(row.names=colnames(tot),
cell=tcolData.2[,1],
condition="wt",
time=as.numeric(gsub("d", "", tcolData.2[,2])),
days=tcolData.2[,3],
rep=tcolData.1[,2])
Note: The sample names are maintained as row names to create corresponding metadata information in the columns: cell as cell type, condition, time as the numeric version for the days and day as the time point after injury, and lastly the replicate’s number.
-
25.
After creating the raw data matrices (one with all cell-types and one with the total muscle) and metadata tables, save them as text files to avoid rerunning the aforementioned steps and restart at this point of the protocol if needed.
# Save count matrices and relative annotation to files
write.table(countData, file="expression-matrices/countData.txt", sep="∖t", quote=F)
write.table(colData, file="expression-matrices/colData.txt", sep="∖t", quote=F)
write.table(tot, file="expression-matrices/total-countD.txt", sep="∖t", quote=F)
write.table(tcolData, file="expression-matrices/t-colData.txt", sep="∖t", quote=F)
Table 1.
Example of the annotation data frame
Cell | Condition | Time | Days | Rep | |
---|---|---|---|---|---|
ec_wt_0d-1 | ec | wt | 0 | 0d | 1 |
ec_wt_0d-2 | ec | wt | 0 | 0d | 2 |
ec_wt_0d-3 | ec | wt | 0 | 0d | 3 |
ec_wt_0d-4 | ec | wt | 0 | 0d | 4 |
ec_wt_1d-1 | ec | wt | 1 | 1d | 1 |
ec_wt_1d-2 | ec | wt | 1 | 1d | 2 |
Cell type specific analysis to identify pop-actived genes
Timing: 25 min
The goal of the cell specific analysis is to define pop-actived by 1) identifying differentially expressed genes (DEGs) across time-points during regeneration in the same cell type; and 2) organizing them in clusters to infer the time point (d) when they are “active”.
Note: To perform Task 1, you will use the R packages, edgeR and RUVSeq. RUV function implemented in the RUVSeq package allows removing hidden batch effects from original data and it is highly suggested with large datasets where the probability of unpredictable batch effects is higher. For convenience, we created an R package with all the handler/wrapper functions that streamlines the analysis. In fact, we provide the user with a function loads data-matrix file, extracts the cell / condition, performs pre-filtering, applies RUVSeq analysis, computes time-dependent variable genes (DEGS), and clusters time dependent genes (to identify pop-actived).
Use the aforementioned function called “runWithParametersRpkm” from the R package cellCB (see GitHub github.com/cavei/cellCB) that operates with the following data and arguments:
“dataFile”, the file containing the count matrices of all the cell types collapsed together; “annotationFile”, the file name with related samples’ annotations; “cell”, the cell type analyzed; “condition”, the condition analyzed (in this example the only condition is WT); “runWithK”, the k number of factors of unwanted variation that are estimated from the data (see RUVSeq parameters for more details); “runWithFormula”, the edgeR formula to compute DEGs (see RUVSeq and edgeR parameters for more details); “removeSamples”, optional list of samples to be removed; “varName”, the variable to be used to compute DEGs; “pADJ.thr”, the thresholds of adjusted p-value filter for DEG analysis; “logfcthr”, log2 fold change filter for DEG analysis; “rcFilter”, read count gene expression filter; “atLeastIn”, number of samples with at least “rcFilter” counts to keep a gene; “rpkmMean”, keep genes with mean RPKM/FPRM gene expression above “rpkmMean”; “clusterK", the number of expected clusters computed from DEGs; “gene_length” the file with estimated gene length for RPKM/FPKM computation (in this example it came along with the archive of the expression matrices).
All these parameters should be defined according to the specific dataset in use. With the data analyzed in the work of Groppa et al., 2023,1 we set up a gene filter to select the genes with 100 counts in at least 2 samples and mean RPKM >= 3 (recommended; these are the default values in the function). For RUVSeq analysis, you can use 3,000 empirical invariant genes (i.e., 3,000 gene with the highest p-value). The choice of RUVSeq k (runWithK) and its related formula “runWithFormula” should be tuned by exploring the diagnostic plots produced after the preliminary run. With k, user sets up the number of dimensions for unwanted variation. The computed dimensions need to be included in the edgeR linear model for DEG computation with the formula specified in the “runWithFormula” argument. To extract DEGs, you can set up the pADJ.thr to 0.01 and logfcthr = 2 because we expect a high number of DEGs and the cutoff has to be more stringent to detect reliable signals (see troubleshooting 4). For the dataset of Groppa et al. 2023,1 including both the individual cell type and the total muscle in WT condition, set clusterK to 9 (see troubleshooting 5).
CRITICAL: There is no gold standard in defining k unwanted weight dimensions. Given that the choice of k (runWithK argument) influences downstream analysis, the optimal approach to evaluate k is by incrementally increasing k from 1 to 10 and examine the data by diagnostic plots. Be aware that each k dimension introduces differences that could deviate from true biology; refer to RUVSeq package vignette for more details. The choice of k unwanted weight dimensions is based on the diagnostic plots of three parameters: Relative Log Expression (RLE) plot, the PCA dispersion and p-value distribution, which are briefly explained here. RLE plot investigates the distributions of read counts across samples compared to a reference sample (virtual sample obtained by the median across all samples). If each distribution is sharply peaked near zero (i.e., with low variance), we conclude that the samples do not deviate from reference. When samples deviate, it suggests that further normalization is needed (see plotRLE help from EDASeq R package for more details). PCA plot examines the degree of replicates’ similarity (samples representative of the same time point) by checking if the samples aggregate into well-defined clusters. The histogram of p-values displays the distribution of genes’ p-values after RUV correction, indicating whether the results of differential gene expression follows the expected distribution (in our case to test if RUVSeq worked or not). In the best scenario, the p-value distribution is expected to peak near zero and have a uniformly distributed right tail.5
Note: To perform Task 2, you will use The R packages, TCSeq and BoolNet. You feed TCSeq with expression data of the DEGs identified in Task 1 to cluster them and extract the time points where genes are active using the “binarizeTimeSeries” function of the BoolNet package. This method uses k-means (set to obtain 2 clusters i.e., active, and non-active) and computes a cutoff according to the centroid values to mark time points above the cutoff as active (boolean value = 1), or otherwise inactive (boolean value = 0). The activation pattern of the cluster centroid is inherited by all the genes belonging to the same cluster. You can define a gene as active in the cell type at day d (pop-actived) if its cluster is ‘‘active’’ at day d.
-
26.
Create an output directory to store results in RData format and load cellCB and RUVSeq R packages.
# Create a directory to store the analyses
resDir <- "rpkm-RDatas-padj0.01-lfc2/"
dir.create(resDir, showWarnings = F)
library(cellCB)
library(RUVSeq)
-
27.Run the function to identify pop-active genes in endothelial cells while defining the optimal k evaluating the aforementioned statistics.
-
a.Run the function using endothelial cells (ec-wt), set k = 1 and apply “runWithFormula” to “∼days + W_1” to detect the genes that change over time, corrected by one unwanted dimension.# RUVSeq k=1 in endothelial cellec_wt <- runWithParametersRpkm("expression-matrices/countData.txt","expression-matrices/colData.txt","ec", "wt",runWithK = 1,runWithFormula = "∼days + W_1",pADJ.thr = 0.01, logfcthr = 2,clusterK = 9,gene_length = "expression-matrices/ensembl_91_gene_length.txt")Note: As shown in Figure 4A, RLE-plot indicates reasonably good results. However, some sample distributions do not peak near zero, demonstrating more variance than other samples. In addition, the PCA plot displays sparsity amongst replicates (samples of the same time point) and the p-value distribution has a slight peak toward 1.
-
b.Set k = 2 and update the edgeR formula to “∼days + W_1 + W_2” to accommodate two unwanted dimensions.# RUVSeq k=2 in endothelial cellec_wt <- runWithParametersRpkm("expression-matrices/countData.txt","expression-matrices/colData.txt","ec", "wt",runWithK = 2,runWithFormula = "∼days + W_1 + W_2",pADJ.thr = 0.01, logfcthr = 2,clusterK = 9,gene_length = "expression-matrices/ensembl_91_gene_length.txt")save(ec_wt, file=paste0(resDir,"ec","-","wt",".RData"))Note: As shown in Figure 4B, two unwanted weight dimensions work better because there is a reduction in the variance (RLE), more homogeneous clusters among replicates (PCA) and the p-value distribution is less skewed toward 1.
-
a.
-
28.
Run the function to identify pop-active genes in the other cell types (Figure 3A).
Note: For cell-specific timeseries samples in the study of Groppa et al. 2023,1 optimal runWithK values are already defined for each cell type. In the Pericyte cell-type, we excluded the sample “per_wt_3d-1” because it was identified as outlier in RLEplot and PCAplot before and after RUV correction.
# RUVSeq runs with optimal k in remaining cell types
fap_wt <- runWithParametersRpkm("expression-matrices/countData.txt",
"expression-matrices/colData.txt",
"fap", "wt", runWithK = 1,
runWithFormula = "∼days + W_1",
pADJ.thr = 0.01, logfcthr = 2,
clusterK = 9,
gene_length = "expression-matrices/ensembl_91_gene_length.txt")
save(fap_wt, file=paste0(resDir,"fap","-","wt",".RData"))
mp_wt <- runWithParametersRpkm("expression-matrices/countData.txt",
"expression-matrices/colData.txt",
"mp", "wt", runWithK = 1,
runWithFormula = "∼days + W_1",
pADJ.thr = 0.01, logfcthr = 2,
clusterK = 9,
gene_length = "expression-matrices/ensembl_91_gene_length.txt")
save(mp_wt, file=paste0(resDir,"mp","-","wt",".RData"))
inf_wt <- runWithParametersRpkm("expression-matrices/countData.txt",
"expression-matrices/colData.txt",
"inf", "wt", runWithK = 1,
runWithFormula = "∼days + W_1",
pADJ.thr = 0.01, logfcthr = 2,
clusterK = 9,
gene_length = "expression-matrices/ensembl_91_gene_length.txt")
save(inf_wt, file=paste0(resDir,"inf","-","wt",".RData"))
per_wt <- runWithParametersRpkm("expression-matrices/countData.txt",
"expression-matrices/colData.txt",
"per", "wt", runWithK = 2,
runWithFormula = "∼days + W_1 + W_2",
removeSamples = "per_wt_3d-1",
pADJ.thr = 0.01, logfcthr = 2,
clusterK = 9,
gene_length = "expression-matrices/ensembl_91_gene_length.txt")
save(per_wt, file=paste0(resDir,"per","-","wt",".RData"))
Figure 4.
Analysis of k dimensions for RUVSeq
(A and B) RLE-plot, PCA-plot and p-value distribution fo EC wt dataset with k = 1 i.e., one unwanted variance dimension (A) and k = 2, i.e., two unwanted variance dimensions (B).
Whole muscle analysis
Timing: 5 min
In this second part of the analysis, you will identify pop-actived in the total tissue (i.e., the genes whose expression changes over time in the muscle, Figure 3B). To achieve this, process the data for total muscle as performed with the individual cell type. The list of pop-actived in the total tissue will be integrated with the genes emerged from the inter-cellular analysis (see “comparison across cell types”), allowing the distinction of cell signals that are related to cell expansion or constitutive gene programs.
-
29.
Run the function to identify pop-active genes in the other cell types.
# Identify pop-actived in the total tissue
tot_wt <- runWithParametersRpkm("expression-matrices/total-countD.txt",
"expression-matrices/t-colData.txt",
"tot", "wt", runWithK = 1,
runWithFormula = "∼days + W_1",
pADJ.thr = 0.01, logfcthr = 2,
clusterK = 9,
gene_length = "expression-matrices/ensembl_91_gene_length.txt")
save(tot_wt, file=paste0(resDir,"tot","-","wt",".RData"))
Comparison across cell types
Timing: 5 min
-
30.
Compute differentially expressed genes among cell types.
Note: In this case, use the explanatory variable “cell” rather than day in the edgeR formula (runWithFormula argument).
# Identify constitutive-active genes
all_wt <- runWithParametersRpkm("expression-matrices/countData.txt",
"expression-matrices/colData.txt",
"ALL", "wt", runWithK=1,
runWithFormula="∼cell + W_1",
removeSamples = "per_wt_3d-1",
varName = "cell", pADJ.thr = 0.01,
logfcthr = 2,
clusterK = 9,
gene_length = "expression-matrices/ensembl_91_gene_length.txt")
save(all_wt, file=paste0(resDir,"all","-","wt",".RData"))
-
31.
Proceed with the selection of genes that do not display any activation pattern in individual cell types over time (flat expression, i.e., not pop-actived), but are enriched in a specific cell subset.
Note: We implemented an intercellular comparison that attributes a gene to a specific cell population when its expression is higher (log2 fold change >= 1.5) than all other cell types. For example, gene X is enriched in EC when it is differentially expressed in EC and its expression is higher than FAP, IC, MP and PER (log2 fold change >= 1.5) (Figure 3C).
library(cellCB)
result_dir <- "rpkm-RDatas-padj0.01-lfc2/"
rdata_storage <- "rdata_storage"
dir.create(rdata_storage, showWarnings = F)
# Assign genes to cell
if (!(file.exists("gene2cellAssociation.RData"))) {
cellExpressionTable <- getCellExpressionSignature(
RData = paste0(result_dir, "/all-wt.RData"))
gene2cellAssociationTables <- cellCB:::associateGeneToCells(cellExpressionTable)
gene2cellAssociation <- cellCB:::tables2maps(gene2cellAssociationTables)
file=paste0(rdata_storage, "/gene2cellAssociation-padj0.01-lfc2-",
as.character(Sys.Date()), ".RData")
link="gene2cellAssociation.RData"
save(gene2cellAssociation, file=file)
file.symlink(file, link)
} else {
load("gene2cellAssociation.RData")
}
names(gene2cellAssociation)
Assign genes to their mode-of-expression category
Timing: 6 min
Now you will expand the cell-specific analysis described in Figure 3D. As previously stated, you will divide genes into i) pop-actived genes; ii) “subset-specific” genes, and iii) “constitutively-active”. The “pop-actived” genes for each cell type were identified in the above section, “cell type specific analysis to identify pop-actived genes”. By combining both the analysis on total tissue and individual cell types, you will identify “subset-specific” and “constitutively-active” genes for each cell type.
Note: We created an R pipeline that classifies genes in the following order for each cell population: “pop-actived” > “subset-specific” > “constitutively-active” (Figure 5). If we consider one cell type, for example the endothelial cells, we can have this scenario: gene A is classified as “pop-actived”, and therefore subjected to ligand-receptor analysis (green light). Genes B, C, D, and E display flat expression since its expression remains unchanged within one cell type across time points. For this reason, genes B, C, D, and E need to be further investigated (yellow light). Gene B follows a temporal pattern of expression in the total tissue and its expression is higher in the endothelial cells than all other cell types (lfc >=1.5). As such, gene B is considered to be "subset-specific" and subjected to downstream analysis. Gene C is not temporally modulated in the total tissue nor it is upregulated by endothelial cells versus other cells types. Thus, gene C is excluded from the ligand-receptor analysis. Gene D is not temporarily modulated, but its expression is higher in endothelial cells compared to the other cells. Gene D is labeled as “constitutively-active” and is considered for the following analysis. Lastly, gene E displays a temporal pattern of expression, but it is not enriched in any cell type, thus it is stopped (red light) because we cannot associate it to any cell type. The function “extractGenesBehavious” implements these decisions.
-
32.
Load the required packages and define the directory where the results are saved.
library(org.Mm.eg.db)
library(clusterProfiler)
library(ggplot2)
library(cellCB)
resultsDir <- "rpkm-RDatas-padj0.01-lfc2/"
tableDir = "sif_tables"
dir.create(tableDir, showWarnings = F)
-
33.
Extract the list of pop-active genes at a specific time point d (=day) for all cell populations.
Note: This results in a list of cells with each cell containing pop-actived genes across time points.
# Extract only the pop-active genes for each cell type
cells <- c("fap", "ec", "mp", "inf", "per")
activeAtDaysCell <- lapply(cells, function(cell) {
cellCB:::getActiveAtDaysFromRData(paste0(resultsDir, "/", cell, "-wt.RData"),
paste0(cell, "_wt"))
})
names(activeAtDaysCell) <- cells
-
34.
Apply the same to the total muscle.
# Extract only the pop-active genes for total muscle
activeAtDaysTot <- cellCB:::getActiveAtDaysFromRData(
paste0(resultsDir,”/tot-wt.RData”), “tot_wt”)
-
35.
Run the function “extractGenesBehavious” that labels each gene according to the rules discussed above.
Note: The object “genesBehavioursCell” analyzes all cell-specific and total tissue samples simultaneously and generates three lists of genes named “activeAtDays” a.k.a. pop-actived, “activeAtDaysTotCell” a.k.a. subset-specific – cell expansion, and “constitutive” a.k.a. constitutively-active genes. The list of “activeAtDays” and “activeAtDaysTotCell” are further divided according to the time point during the regeneration.
# Extract gene behavior
genesBehavioursCell <- lapply(cells, function(cell) {
extractGenesBehavious(activeAtDaysCell[[cell]], activeAtDaysTot, gene2cellAssociation[[cell]])
})
names(genesBehavioursCell) <- cells
Figure 5.
Classification of genes during the bioinformatics analysis
Schematic overview of the process applied to classify the genes emerged from the analysis of differentially regulated genes (DEGs) in the individual cell types, in the total tissue, and in the intercellular comparison.
Ligand receptor analysis
Timing: 8 min
To create the intercellular interactome, extract the receptors and ligands from “genesBehavioursCell” object from each cell. In Groppa et al. 20231 analysis, we used the ligand-receptor database published by Rezza et al. 2016,13 however any other sif file (Simple interaction file – text file with at least two columns: source and destination) can be applied (see troubleshooting 6). A copy of the sif file from Rezza et al. is provided here.
-
36.
Download and read the sif file or read it directly from the url https://github.com/cavei/muscle-regeneration-cell-rep/raw/main/ligands-receptor-rezza/database-mouse-sif.txt.
# Read the sif file into R
url = "https://github.com/cavei/muscle-regeneration-cell-rep/raw/main/ligands-receptor-rezza/database-mouse-sif.txt"
# or download the file in your working directory and set
url = "database-mouse-sif.txt"
db <- read.table(url, header=F, sep="∖t", quote="∖"", stringsAsFactors = F)
rezza.db <- list(ligands=unique(db$V2), receptors=unique(db$V3))
-
37.
Filter the results of “genesBehavioursCell” with the list of ligands and receptors.
Note: This operation extracts sets of ligands and receptors that are pop-active, subset-specific, or constitutively active for each cell type at every time point. The ligands and receptors that belong to any of the three categories are named active ligands and receptors.
ligandReceptorBehavioursCell <- lapply(cells, function(cell)
extractLigandReceptorBehaviours(genesBehavioursCell[[cell]], rezza.db))
names(ligandReceptorBehavioursCell) <- cells
-
38.
For example, to find ligands and receptors present in EC at day 6 after muscle injury, run the following:
ligandReceptorBehavioursCell$ec$ligandReceptorDayBatch$d6
## $ligandOfTheDay
## [1] "Igf1" "Csf3" "Clu" "Dll1" "Tnfsf10" "Il6" "Cxcl9"
## [8] "Cxcl10" "Igf2" "Vegfc" "Il15" "Mmp13"
##
## $receptorOfTheDay
## [1] "Tnfrsf11a" "Bdkrb2" "Csf2rb" "H2-Bl" "Gabbr1" "Aplnr"
## [7] "Lepr" "Tnfrsf25" "Fgfr3" "Lsr" "F2rl3" "Il10ra"
-
39.
Check the numbers of ligand and receptor identified in EC WT.
stats_ec <-sapply(ligandReceptorBehavioursCell$ec$ligandReceptorDayBatch, function(x) c(length(x[[1]]), length(x[[2]])))
row.names(stats_ec) <- c("ligandOfTheDay", "receptorOfTheDay")
stats_ec
## d0 d1 d2 d3 d4 d5 d6 d7 d10 d14
## ligandOfTheDay 17 17 29 29 15 12 12 22 12 10
## receptorOfTheDay 13 11 22 23 15 12 12 16 12 10
-
40.
Transform the data and collect all the ligands with their expression timing and generate the ligand info table for each cell.
ligandsTimingCell <- lapply(names(ligandReceptorBehavioursCell), function(cell) {
cdata <- ligandReceptorBehavioursCell[[cell]]
batch <- meltListOfVectors(lapply(cdata$ligandReceptorDayBatch,
function(ddata) ddata$ligandOfTheDay))
names(batch)=c("day", "ligand")
batch$source <- "batch"
tot <- meltListOfVectors(lapply(cdata$ligandReceptorDayTot,
function(ddata) ddata$ligandOfTheDay))
names(tot)=c("day", "ligand")
tot$source <- "total"
const <- data.frame(day = "all",
ligand= cdata$ligandReceptorConstitutive$ligandOfTheDay)
const$source <- "constitutive"
ltc <- rbind(batch, tot, const)
ltc$cell <- cell
ltc
})
names(ligandsTimingCell) <- names(ligandReceptorBehavioursCell)
Note: The expected outcome is a list of tables (see an example table in Table 2) where column #1 contains the day; column #2 the ligand, column #3 the mode of expression of the X gene that can be classified as batch (i.e., pop-active), total (i.e., subset-specific) or constitutive (i.e., constitutively-active), and column #4 the cell (for example EC); constitutive genes are marked as “all” in the “day column” meaning that they are always active.
-
41.
Now concatenate the tables for all cell types.
ligandMatrix <- do.call(rbind, ligandsTimingCell)
row.names(ligandMatrix) <- NULL
write.table(ligandMatrix, file=paste0(tableDir, "/", "ligandDescription.txt"),
sep="∖t", quote=F, row.names=F)
-
42.
Replace “all” with all the time points.
all_idx <- which(ligandMatrix$day=="all")
chrday <- c("d0","d1","d10","d14","d2","d3","d4","d5", "d6","d7")
expanded <- lapply(all_idx, function(i) {
base <- ligandMatrix[i, 2:4]
data.frame(day=chrday, ligand=base$ligand, source=base$source, cell=base$cell)
})
expanded <- do.call(rbind, expanded)
ligandMatrixExpanded <- rbind(ligandMatrix[-all_idx, ], expanded)
-
43.
Create a map between ligands (and the day when they are active) and their expression mode in each cell type.
day_lig <- paste(ligandMatrixExpanded$day, ligandMatrixExpanded$ligand, sep="_")
source_cell <- paste(ligandMatrixExpanded$source, ligandMatrixExpanded$cell, sep="_")
ligand.dict <- tapply(source_cell, day_lig, paste, collapse=";")
Note: You have successfully created a dictionary that maps each ligand at day d to its expression mode and cell type where gene X is active, as shown in Table 3. Here, gene “Adm” at day “d0” is produced by cell-specific expansion of endothelial cells (“total_ec”; subset-specific) and by pericytes in a time-modulated manner (“batch_per”; pop-actived).
Table 2.
Table with ligand timing, source, and cell type
Day | Ligand | Source | Cell |
---|---|---|---|
d0 | Ctgf | batch | ec |
d0 | Igf1 | batch | ec |
all | Efna1 | constitutive | ec |
all | Jag2 | constitutive | ec |
d0 | Dll4 | total | ec |
d0 | Adm | total | ec |
Table 3.
Example of the map to link ligands (and the day when they are active) and their expression mode in each cell type
Batch | |
---|---|
d0_Adm | total_ec;batch_per |
d0_Angpt1 | batch_fap |
d0_Angpt2 | constitutive_ec;constitutive_inf;constitutive_per |
d0_Angpt4 | batch_per |
d0_Bmp1 | batch_mp;batch_inf |
d0_Bmp2 | batch_fap;batch_mp;constitutive_per |
Build the interactome
Timing: 6 min
At this stage, you have created “ligandReceptorBehavioursCell” that displays the activity of both ligands and receptors at each time point and in each cell type, and “ligand.dict” that associates the ligand to its cellular source, the mode of expression, and the day when the gene is active.
-
44.
To infer if a ligand can interact with its receptor, create the network of interaction between ligands and receptors by loading the sif from Rezza et al. 2016 into a graphNEL object (igraph R package).
library(igraph)
library(graph)
lr.sif <- as.matrix(db[,2:3])
colnames(lr.sif) <- c("src", "dest")
receptorGraph <- igraph::as_graphnel(igraph::graph(
cellCB::edgeList(lr.sif), directed=TRUE))
Note: The “receptorGraph” object contains the graphNEL with the ligand-receptor interactions. To infer the type and time of interactions taking place during muscle regeneration, and the cell types involved, filter graphNEL (i.e., create the induced sub-network) using the lists of active ligands and receptors for each cell population and each time point.
-
45.
Create a directory to store all results as many intermediate tables are required for this step.
saveFiles = TRUE
if (saveFiles) {
if (!file.exists(tableDir))
dir.create(tableDir)}
-
46.
Create cell type specific and time -wise sif tables from graphNEL interactome.
Note: In the following, implement an R function to create the whole interactome sif table. The function will be applied to each cell type stored in the object “ligandReceptorBehavioursCell” using “lapply”. The cell type that expresses the receptors is considered as the reference cell. Then, the function creates a cumulative list with all ligands expressed by all cell types (including autocrine interaction and constitutive ligands) and receptors expressed by the reference cell (for each time point). Additionally, it creates the autocrine and paracrine lists of ligands where autocrine refers to ligands and receptors that are produced by reference cell and paracrine means that ligands are produced by other cell types besides the reference cell. Using this information, together with the graphNEL object “receptorGraph”, now you can create the network of intercellular interactions at any given time point, specifying whether the interaction is paracrine or autocrine.
sifTables <- lapply(names(ligandReceptorBehavioursCell), function(cell) {
removeAutocrine=FALSE
useConstitutiveLigands=TRUE
cat(cell, "∖n")
condition="wt"
receptorFrom <- cell
ligandsReceptorCumulative <- paraWiseMultiMergeTimePoints(
ligandReceptorBehavioursCell, ligCells = cells,
recCell = receptorFrom,
removeAutocrine=removeAutocrine,
useConstitutiveLigands=useConstitutiveLigands)
autocrine <- wiseMergeTimePoints(ligandReceptorBehavioursCell[[cell]])
paracrine <- paraWiseMultiMergeTimePoints(ligandReceptorBehavioursCell,
ligCells = cells,
recCell = receptorFrom,
removeAutocrine=removeAutocrine,
useConstitutiveLigands=useConstitutiveLigands)
dailySif <- cellCB:::createSifOFLigandReceptor(
ligandsReceptorCumulative,
receptorGraph, cell, condition)
days <- intersect(names(autocrine), names(dailySif))
for (d in days) {
dailySif[[d]]$day <- d
dailySif[[d]]$feedSystem <- "paracrine"
isAutocrine <- dailySif[[d]]$src %in% autocrine[[d]]$ligandOfTheDay
isParacrine <- dailySif[[d]]$src %in% paracrine[[d]]$ligandOfTheDay
dailySif[[d]]$feedSystem[isAutocrine] <- "autocrine"
dailySif[[d]]$feedSystem[isAutocrine & isParacrine] <- "autocrine;paracrine"
dailySif[[d]]$cell <- cell
colnames(dailySif[[d]]) <- c("ligand","receptor","day", "feedSystem", "cell")
dailySif[[d]] <- dailySif[[d]][c("day", "ligand", "receptor", "cell", "feedSystem")]
if (saveFiles) {
write.table(dailySif[[d]], file = paste0(tableDir, "/",
paste(c(cell, condition, d), collapse = "-"), ".txt"),
sep="∖t", quote=F, row.names=F)
}}
dailySif})
names(sifTables) <- names(ligandReceptorBehavioursCell)
-
47.
Check the format of the sifTables.
head(sifTables$ec$d0)
## day ligand receptor cell feedSystem
## 1 d0 Ctgf Itga5 ec autocrine;paracrine
## 2 d0 Igf1 Igfbp3 ec autocrine;paracrine
## 3 d0 Igf1 Igfbp7 ec autocrine;paracrine
## 4 d0 Igf1 Igfbp2 ec autocrine;paracrine
## 5 d0 Kitl Kit ec autocrine;paracrine
## 6 d0 Pgf Flt1 ec paracrine
Note: sifTables stores the interactome for each cell at each time point together with the annotation in the “feedSystem” column that indicates whether the interaction is autocrine or paracrine loop (or both).
-
48.
Translate each ligand/day combination in its cell type and mode of expression.
Note: The next step is to include the information about the source of the ligand and the expression mode (batch, total or constitutive). You will need the R object called “ligand.dict” you created earlier at the end of the “ligand receptor analysis” section. This generates tables with the interactome network that can be saved in a format readable with Cytoscape or any other software used by the operator to manipulate networks.
sifcell_day_tables <- lapply(sifTables, function(sifcell) {
tbls <- lapply(sifcell, function(day_table) {
id <- paste(day_table$day, day_table$ligand, sep="_")
source_cell <- ligand.dict[id]
df <- data.frame(ligand=day_table$ligand,
source_cell=source_cell,
day = day_table$day,
receptor=day_table$receptor,
cell=day_table$cell,
feedSystem=day_table$feedSystem)
expand_df <- lapply(seq_len(nrow(df)), function(i) {
base <- df[i, ]
sources <- strsplit(base$source_cell, ";")[[1]]
data.frame(ligand=base$ligand,
source_cell=sources,
day = base$day,
receptor=base$receptor,
cell=base$cell)
})
do.call(rbind, expand_df)
})
collapsed <- do.call(rbind, tbls); row.names(collapsed) <- NULL
collapsed
})
-
49.
As final step, collapse all the tables in a unique table that contains all the interactions among the five cell populations and across the time-course of muscle regeneration.
whole_interactome <- do.call(rbind, sifcell_day_tables)
row.names(whole_interactome) <- NULL
source <- sapply(strsplit(whole_interactome$source_cell, "_"), "[[", 1)
cell <- sapply(strsplit(whole_interactome$source_cell, "_"), "[[", 2)
whole_interactome <- data.frame(ligand=whole_interactome$ligand,
l.source=source,
l.cell=cell,
day=whole_interactome$day,
receptor=whole_interactome$receptor,
r.cell=whole_interactome$cell)
-
50.
Inspect the whole interactome table.
head(whole_interactome)
## | ligand | l.source | l.cell | day | receptor | r.cell |
## 1 | S100b | batch | fap | d0 | Fgfr1 | fap |
## 2 | S100b | batch | mp | d0 | Fgfr1 | fap |
## 3 | S100b | constitutive | per | d0 | Fgfr1 | fap |
## 4 | Igf1 | batch | fap | d0 | Igfbp5 | fap |
## 5 | Igf1 | batch | ec | d0 | Igfbp5 | fap |
## 6 | Igf1 | batch | mp | d0 | Igfbp5 | fap |
Note: The expected output is a table that combines all ligands and receptors for all cell types and all time points.
Expected outcomes
Here, we detailed the pipeline to build an intercellular interactome that could be applied to any RNAseq dataset. We believe that bulk sequencing is a better strategy than scRNAseq for building interaction networks. In fact, while scRNASeq allows for the interrogation of cell population heterogeneity, it presents some caveats for mapping an intercellular interactome due to its limited depth of sequencing in comparison to bulk sequencing.
If the reader would like to expand the RNASeq analysis, the work of Groppa et al. 20231 describes other tools to study the pathways associated with DEGs (GO-analysis14,15,16), the degree of proliferation of each cell population (mitotic index17), the landmark time points of cell-activation (silhouette18 and homogeneity index19), and the activation of specific receptors over time (timeClip10 analysis and VIPER20).
Limitations
The main limit of this analysis is the use of gene expression as an estimation of protein expression. This should be taken into consideration when interpreting the intercellular network and further validations at the protein level should be performed as done in the work of Groppa and colleagues.1 Furthermore, this analysis does not consider the presence of physical barriers in the tissue and assumes the presence of interactions purely based on transcript expression. The computational prediction can connect two cell types through ligand-receptor signaling, however, the tissue compartmentalization given by structures like the basement membrane, might prevent this communication. For this reason, the computational modeling of the intercellular interactome should be integrated with functional analysis, for example using transgenic knockout systems as done in the work of Groppa et al.1
Troubleshooting
Problem 1
You may find some incompatibility with software versions and cellCB.
Potential solution
-
•
Use the most recent version available of both software and packages for running the analysis here described (related to step 4 in the before you begin section).
-
•
If the newer versions of Bioconductor/ R CRAN packages conflict with cellCB custom package, open an issue on the GitHub issue tracker and inquire about the solutions.
Problem 2
You may observe a high percentage of Hoechsthigh/PI+ cells that indicates poor cell viability in your sample.
Potential solutions
-
•
During the tissue collection, digestion, and staining, use prechilled solutions and keep the samples in ice, with the exception of steps 2.c and 3.c (see related Step 2 and 3).
-
•
Ensure that at the beginning of the second digestion step with collagenase D/dispase II, you have loose tissue pieces instead of completely dissociated cells in suspension (related to Step 2 and 3).
Problem 3
You may observe a low yield of purified cells that is not sufficient for RNA sequencing.
Potential solution
-
•
After the tissue digestion, at the step 8), pre-wet the 70 μm and 40 μm filter caps with two drops of FACS buffer to facilitate the filtration step and avoid any cell loss.
Problem 4
Using the selected cut off you are not able to retrieve a good number of Differentially expressed Genes (DEGs).
Potential solution
If the number of DEGs is not sufficient of adequate to downstream analysis you can safely relax the thresholds (related to step 27). First relax log fold change threshold or remove this cutoff. Second relax adjusted p-value threshold up to 0.2.
Problem 5
The k number of clusters is not adequate to your dataset.
Potential solution
The choice of number of clusters of co-regulated genes is empirical and highly dependent on the data and the number of time points in the dataset. To get a hint on the optimal number of clusters of k clusters (clusterK) set for k-means clustering suited for your data (related to step 27), you can run your data through Nbclust package18 that will provide the optimal k number as a the consensus of up to 30 different methods to define optimal number of clusters. Alternatively check the homogeneity index from ClValid package.19 For Groppa et al. 20231 dataset, we used NbClust to determine the optimal number of clusters within the range of 5 to 15 on all the datasets (EC, FAP, MP, Per, IC). The majority of the scores suggested 5 or 6 clusters, while another relevant set of scores indicated 9 clusters or more, with a peak at 14 clusters.
To empirically assess the most suited number of clusters for our dataset, we performed three cluster analyses with 5, 14, and 9 numbers of clusters. When using 5 clusters, we observed well-defined trends, as illustrated in the example from the FAP dataset (Figure 6). However, these clusters contained numerous genes, evident from their thickness, which led to flattened trends. Considering our binarization approach based on centroids, we concluded that 5 clusters were too coarse. Subsequently, we used 14 clusters, as demonstrated in Figure 7 (still using the FAP dataset). Here, we observed several clusters that were very similar, such as clusters 11 and 12, or clusters 8 and 13. After careful consideration, we settled on using 9 clusters because we did not observe any relevant duplications in trends in FAP (Figure 8) and across all the datasets.
Figure 6.
Cluster analysis with k = 5 clusters
Expression trends over time of the five clusters created by cluster analysis.
Figure 7.
Cluster analysis with k = 14 clusters
Expression trends over time of the 14 clusters created by cluster analysis.
Figure 8.
Cluster analysis with k = 9 clusters
Expression trends over time of the nine clusters created by cluster analysis.
Problem 6
You are not able to find the ligand receptor database for your dataset.
Potential solution
If you do not manage to find a ligand-receptor database for your species of interest you can translate Rezza et al.13 database using homologous genes, for example using BioMart 21 (related to step 36).
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by Dr. Paolo Martini (paolo.martini@unibs.it).
Materials availability
This study did not generate new unique reagents.
Acknowledgments
This work was supported by CIHR grant FDN-159908 (to F.M.V.R.). L.W.T. was funded by 4YF from the University of British Columbia and the Dennis Washington Leadership Graduate Scholarship from the Dennis and Phyllis Washington Foundation.
We thank the BRC-seq, UBC Flow Core, AbLab, and BRC genotyping and transgenic units.
Author contributions
All authors were responsible for performing experiments and analysis. E.G., P.M., and F.M.V.R. performed the experimental design and data interpretation and prepared the manuscript. L.W.T., M.R., and S.M. provided support with data analysis and edited the manuscript.
Declaration of interests
The authors declare no competing interests.
Contributor Information
Elena Groppa, Email: egroppa@sissa.it.
Paolo Martini, Email: paolo.martini@unibs.it.
Data and code availability
-
•
RNAseq data have been deposited at GEO and are publicly available. Accession number is listed in the key resources table.
-
•
All original code used has been deposited at Zenodo and is publicly available. Accession number is listed in the key resources table. The version of cellCB R package used for this paper has been deposited at Zenodo: https://doi.org/10.5281/zenodo.8127429.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
References
- 1.Groppa E., Martini P., Derakhshan N., Theret M., Ritso M., Tung L.W., Wang Y.X., Soliman H., Hamer M.S., Stankiewicz L., et al. Spatial compartmentalization of signaling imparts source-specific functions on secreted factors. Cell Rep. 2023;42:112051. doi: 10.1016/j.celrep.2023.112051. [DOI] [PubMed] [Google Scholar]
- 2.Lemos D.R., Babaeijandaghi F., Low M., Chang C.-K., Lee S.T., Fiore D., Zhang R.-H., Natarajan A., Nedospasov S.A., Rossi F.M.V. Nilotinib reduces muscle fibrosis in chronic muscle injury by promoting TNF-mediated apoptosis of fibro/adipogenic progenitors. Nat. Med. 2015;21:786–794. doi: 10.1038/nm.3869. [DOI] [PubMed] [Google Scholar]
- 3.R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- 4.Morgan M (2023). BiocManager: Access the Bioconductor Project Package Repository. R package version 1.30.20, https://CRAN.R-project.org/package=BiocManager.
- 5.Risso D., Ngai J., Speed T.P., Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 2014;32:896–902. doi: 10.1038/nbt.2931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Robinson M.D., McCarthy D.J., Smyth G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Csardi G, Nepusz T (2006). “The igraph software package for complex network research.” InterJournal, Complex Systems, 1695. https://igraph.org.
- 8.Wu T., Hu E., Xu S., Chen M., Guo P., Dai Z., Feng T., Zhou L., Tang W., Zhan L., Fu x, Liu S., Bo X., Yu G. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021;2:100141. doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Carlson M (2019). org.Mm.eg.db: Genome wide annotation for Mouse. R package version 3.8.2.
- 10.Martini P., Sales G., Calura E., Cagnin S., Chiogna M., Romualdi C. timeClip: pathway analysis for time course data without replicates. BMC Bioinf. 2014;15:S3. doi: 10.1186/1471-2105-15-S5-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Joe A.W.B., Yi L., Natarajan A., Le Grand F., So L., Wang J., Rudnicki M.A., Rossi F.M.V. Muscle injury activates resident fibro/adipogenic progenitors that facilitate myogenesis. Nat. Cell Biol. 2010;12:153–163. doi: 10.1038/ncb2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Crisan M., Yap S., Casteilla L., Chen C.-W., Corselli M., Park T.S., Andriolo G., Sun B., Zheng B., Zhang L., et al. A Perivascular Origin for Mesenchymal Stem Cells in Multiple Human Organs. Cell Stem Cell. 2008;3:301–313. doi: 10.1016/j.stem.2008.07.003. [DOI] [PubMed] [Google Scholar]
- 13.Rezza A., Wang Z., Sennett R., Qiao W., Wang D., Heitman N., Mok K.W., Clavel C., Yi R., Zandstra P., et al. Signaling Networks among Stem Cell Precursors, Transit-Amplifying Progenitors, and their Niche in Developing Hair Follicles. Cell Rep. 2016;14:3001–3018. doi: 10.1016/j.celrep.2016.02.078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Yu G., Wang L.-G., Han Y., He Q.-Y. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS A J. Integr. Biol. 2012;16:284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Sales G., Calura E., Cavalieri D., Romualdi C. graphite - a Bioconductor package to convert pathway topology to gene network. BMC Bioinf. 2012;13:20. doi: 10.1186/1471-2105-13-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sales G., Calura E., Romualdi C. meta Graphite–a new layer of pathway annotation to get metabolite networks. Bioinformatics. 2019;35:1258–1260. doi: 10.1093/bioinformatics/bty719. [DOI] [PubMed] [Google Scholar]
- 17.Dmitrijeva M., Ossowski S., Serrano L., Schaefer M.H. Tissue-specific DNA methylation loss during ageing and carcinogenesis is linked to chromosome structure, replication timing and cell division rates. Nucleic Acids Res. 2018;46:7022–7039. doi: 10.1093/nar/gky498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Charrad M., Ghazzali N., Boiteau V., Niknafs A. Nbclust: An R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 2014;61:1–36. doi: 10.18637/jss.v061.i06. [DOI] [Google Scholar]
- 19.Brock G., Pihur V., Datta S., Datta S. ClValid: An R package for cluster validation. J. Stat. Softw. 2008;25:1–22. doi: 10.18637/jss.v025.i04. [DOI] [Google Scholar]
- 20.Alvarez M.J., Shen Y., Giorgi F.M., Lachmann A., Ding B.B., Ye B.H., Califano A. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat. Genet. 2016;48:838–847. doi: 10.1038/ng.3593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kinsella R.J., Kähäri A., Haider S., Zamora J., Proctor G., Spudich G., Almeida-King J., Staines D., Derwent P., Kerhornou A., et al. Ensembl BioMarts: A Hub for Data Retrieval across Taxonomic Space. Database. 2011;2011 doi: 10.1093/database/bar030. bar030. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
-
•
RNAseq data have been deposited at GEO and are publicly available. Accession number is listed in the key resources table.
-
•
All original code used has been deposited at Zenodo and is publicly available. Accession number is listed in the key resources table. The version of cellCB R package used for this paper has been deposited at Zenodo: https://doi.org/10.5281/zenodo.8127429.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.