Abstract
The primary regulators of the innate immune response to implanted biomaterials are macrophages, which change phenotype over time to regulate multiple phases of the tissue repair process. Immunomodulatory biomaterials that target macrophage phenotype are a promising approach for promoting tissue repair. Although expression of multiple markers has been widely used to characterize macrophage phenotype, the complexity of the macrophage response to biomaterials makes interpretation difficult. The aim of this study was to put forth an objective method to characterize macrophage phenotype with respect to specific biological processes or standard phenotypes of interest. We investigated the utility of gene set analyses to analyze macrophages as they respond to model biomaterials in comparison to “reference” M1 and M2a macrophage phenotypes. Primary human macrophages were seeded onto crosslinked collagen scaffolds with or without adsorption of the proinflammatory cytokine interferon-gamma (IFNg). Gene expression of a custom-curated panel of 48 genes, representing the M1 and M2a gene signatures as well as other genes important for angiogenesis and tissue repair, was quantified using NanoString on days 3, 5, and 8 of culture. A dataset of phenotype controls, consisting of M0, M1, and M2a macrophages, was used as a source of comparison and to validate the methods of characterization. Gene expression of M1 and M2a markers showed mixed upregulation and downregulation by macrophages seeded on collagen and IFNg-adsorbed collagen scaffolds, highlighting the need for more holistic analyses. Euclidean distance measurements to the reference phenotypes were unable to resolve differences between groups. In contrast, rotation gene set testing with and without gene weighting based on the genes' ability to differentiate between M1, M2a, and M0 controls, followed by gene set variation analysis, showed that collagen scaffolds inhibited the classic M1 phenotype without promoting a classic M2a phenotype, and that IFNg-adsorbed collagen scaffolds promoted the M1 phenotype and inhibited the M2a phenotype. In summary, this work demonstrates a powerful, objective methodology for characterizing the macrophage response to biomaterials in comparison to reference macrophage phenotypes. With the addition of more macrophage phenotypes with defined gene expression signatures, this method could prove beneficial for characterizing complex hybrid phenotypes.
Impact statement
Immunomodulatory biomaterials that target macrophage phenotype are a promising approach for promoting tissue repair. However, due to the complexity of macrophage behavior as they interact with biomaterials, there is a need for improved methods for their characterization. We demonstrate the utility of gene set analyses to characterize the macrophage response to biomaterials using the classic M1 and M2a phenotypes as a reference, and suggest that this method may be adopted as a useful method for characterizing macrophage phenotype in a way that is practical, thorough, objective, and tailorable to particular phenotypes of interest. This methodology allows for identification of hybrid phenotypes and incorporation of additional phenotypes as their set of markers is established. Due to the variety of applications, this methodology is likely to be useful for engineers, scientists, and clinicians.
Keywords: macrophage, gene set analysis, phenotype
Introduction
Macrophages are primary regulators of the immune response and play major roles throughout the tissue repair process. In normal wound healing, macrophages are initially proinflammatory (also called M1) and later transition to an alternative state (M2) that allows them to facilitate the resolution of healing. In addition to their significant roles in the healing of critical injuries, macrophages also regulate the inflammatory response to biomaterials. One of the first studies to highlight the importance of macrophage phenotype in the success or failure of biomaterials showed a relationship between markers of the M2 phenotype and “constructive remodeling” of biomaterials implanted in abdominal muscle defects.1,2 At the same time, an initial M1 response is also important for wound healing, angiogenesis, and biomaterial integration, and the M2 phenotype itself can encompass a wide diversity of distinct phenotypes.3 Examples such as this that highlight the importance of macrophage phenotype in tissue repair led to a new field of immunomodulatory biomaterial design (for review, see Spiller and Koh,4 Witherel et al.,5 O'Brien et al.,6 Garash et al.,7 and Julier et al.8).
The unique abilities of macrophages to rapidly change behavior in response to their environment and to exhibit a spectrum of phenotypes, including hybrid phenotypes, pose a challenge when trying to understand their behavior in response to complex signals, which may include biochemical stimuli, mechanical cues, or both, as in the case of contact with immunomodulatory biomaterials. Approaches for characterizing the macrophage response to such complex stimuli have commonly involved analyzing protein and gene expression of a handful of known markers of the M1 and M2 phenotypes, which were identified using in vitro polarization with defined chemical stimuli, namely lipopolysaccharide (LPS) ± interferon-gamma (IFNg) for M1, interleukin-4 (IL4) ± IL13 for M2a, and IL10 for M2c.9–15 However, macrophages do not respond to biomaterials in the same way that they do to these biochemical stimuli, leading to the need to increase the numbers of phenotype markers to capture this increase in complexity. For this reason, gene expression has been widely used to characterize macrophage phenotype. However, as the number of phenotypic markers used for characterization increases, it becomes increasingly difficult to interpret the complex macrophage response to biomaterials. For example, Graney et al.14 investigated the behavior of macrophages cultured on ceramic-based scaffolds and found hybrid activation states that were not distinctly M1, M2a, or M2c for all scaffold groups since some markers were upregulated, while others were downregulated. There is a need for methods that facilitate the interpretation of mixed expression of increasing/decreasing markers of multiple macrophage phenotypes.
A powerful group of methods to address these challenges is gene set analysis (GSA). Instead of analyzing each gene individually, genes are organized into sets that represent particular pathways, processes, or conditions that are important to the study at hand, and then, gene set-level statistics are used to determine whether the gene sets are significantly different between sample groups. Many GSA methods consider both the direction and magnitude of each gene and gene weights can be incorporated into select methods, which is useful if certain genes are considered more important than others for characterizing a particular pathway. Gene set testing is useful for organizing and interpreting differential expression analysis, increasing power to detect significant differences between sample groups compared to analysis of individual genes, and considering important genes with modest, but coordinated changes in expression, as opposed to only those that are significantly differentially expressed.
For example, Xue et al.16 employed weighted gene coexpression network analysis (WGCNA)17 to investigate the phenotypes of macrophages isolated from the lungs of smokers in comparison with 28 unique transcriptional programs of macrophages activated in vitro. Although this study showed the value of using gene-set enrichment analysis to compare macrophages isolated from the in vivo environment to “reference” phenotypes that had been prepared in vitro, WCGNA is designed for large, high-dimensional data sets, which is not feasible or practical for many applications in immunomodulatory biomaterial design. While whole transcriptome analysis may a useful approach for characterizing the macrophage response to biomaterials in an unbiased way, it may not be necessary if the goal is to determine the similarities and differences of the biomaterial-educated macrophages to phenotypes that are known to be important in tissue repair, such as M1, M2a, M2c, and M2f. To answer this question, it is necessary to analyze known markers of these phenotypes. Therefore, there is a need for a methodology that can characterize macrophages in comparison to reference phenotypes and that is applicable to relatively small, but complex datasets.
In this study, the goal was to develop a standard methodology based on GSA methods to assess the behavior of macrophages in contact with model immunomodulatory biomaterials by their comparison to reference M1 and M2a phenotypes prepared using defined chemical stimuli in vitro, which are useful, if simplified, models of macrophages in vivo. Recognizing that macrophages can exist as more than just M1 and M2a phenotypes, this method could be expanded for multiple different reference phenotypes of macrophages (e.g., M2b, M2c, and M2f) if their gene expression signatures are known. Our strategy was divided into four steps: (1) select signature gene sets to represent the M1 and M2a macrophage phenotypes; (2) select GSA methods applicable to small, but complex experimental designs; (3) validate the GSA methods for their ability to quantify significant enrichment/upregulation of M1 and M2a genes in control macrophages; and (4) apply these methods to characterize the phenotypes of macrophages cultured on complex biomaterials. We chose to focus on a relatively small number of genes and gene sets that are representative of reference M1 and M2a phenotypes, which are also relevant to angiogenesis and tissue repair processes.
Materials and Methods
Experimental design
Phenotype controls prepared with defined chemical stimuli, namely, M0, M1, and M2a, were utilized for validating our methods of characterization (Fig. 1A). The model immunomodulatory biomaterials for this study consisted of crosslinked collagen scaffolds with and without adsorption of the proinflammatory cytokine (and M1 stimulus) IFNg (Fig. 1B). Crosslinked collagen was selected because it is a commonly used tissue engineering scaffold with potentially complex effects on macrophage behavior. While it has been previously shown to promote M2 macrophage polarization in vivo,3 the addition of IFNg and crosslinking was expected to increase markers of M1 activation. To understand the response of macrophages to the scaffolds in comparison to the reference M1 and M2a macrophage phenotypes, gene expression of a custom-selected panel of macrophage phenotype-related genes was quantified using NanoString nCounter Technology (Fig. 1C). First, Euclidean distance measurements were used to compare the gene expression signatures of macrophages seeded on collagen scaffolds to the reference phenotype controls (Fig. 1D). Then, GSA was performed using rotation gene set testing (ROAST)18 and gene set variation analysis (GSVA)19 (Fig. 1E). ROAST uses parametric gene and set-level statistics followed by simulation (i.e., rotations of the residuals) to calculate gene set p-values and it allows for incorporation of gene weights, a significant advantage over other methods. In contrast, GSVA uses a nonparametric approach to provide enrichment scores for each gene set and individual sample that can then be analyzed using standard statistical tests. Summaries of the differences and advantages of a variety of GSA methods can be found in Nam and Kim,20 Khatri et al.,21 Maciejewski,22 and Goeman and Buhlmann.23
FIG. 1.
Overview of the experimental design. (A) First, reference macrophage phenotype controls were created by differentiating peripheral blood monocytes into unactivated macrophages (M0) and polarizing M0 macrophages into the M1 and M2a phenotypes. (B) M0 macrophages were seeded onto (1) ultra-low attachment polystyrene plates to serve as a phenotype control (“M0” group), (2) EDC/NHS crosslinked collagen scaffolds (“Collagen” group), and (3) EDC/NHS crosslinked collagen scaffolds adsorbed with the proinflammatory cytokine IFNg (“IFNg-collagen” group). Cell-seeded scaffolds were collected on days 3, 5, and 8 postseeding for RNA extraction, while reference phenotype control samples were collected on day 7. (C) Gene expression was quantified using the NanoString nCounter System and analyzed using (D) Euclidean distance measurements or (E) gene set analysis methods ROAST and GSVA with custom gene sets representing the reference M1 and M2a macrophage phenotypes. EDC, 1-ethyl-3(3-dimethylaminopropyl) carbodiimide; GSVA, gene set variation analysis; IFNg, interferon-gamma; IL, interleukin; LPS, lipopolysaccharide; MCSF, macrophage colony-stimulating factor; NHS, N-hydroxysuccinimide; ROAST, rotation gene set testing. Color images are available online.
Crosslinking Ultrafoam collagen sponge scaffolds
Collagen sponge (Avitene™ Ultrafoam™) scaffolds were biopsied into discs (5 mm in diameter and 2.5 mm thick). The scaffolds were crosslinked by incubation in a solution of 1.15 mg/mL 1-ethyl-3(3-dimethylaminopropyl) carbodiimide (EDC) and 0.261 mg/mL N-hydroxysuccinimide (NHS) in phosphate-buffered saline (PBS) for 2 h at room temperature. The scaffolds were washed five times in PBS for 5 min each to remove any unreacted EDC/NHS. Then, 375 ng IFNg in PBS was added to the IFNg-collagen group for 1 h at room temperature.
Release studies
IFNg-adsorbed and control collagen scaffolds were immersed in 1 mL of PBS. The release media were removed and completely replaced at 2 h, 4 h, 1 day, 3 days, 6 days, 8 days, and 14 days. Samples were frozen at −80°C until analysis by enzyme-linked immunosorbent analysis for IFNg (Peprotech).
Seeding macrophages onto scaffolds
Monocytes isolated by negative selection from human blood of four healthy volunteers were purchased from the University of Pennsylvania immunology core (Philadelphia, PA). The monocytes were then cultured at 1 × 106 cells/mL in complete media (Roswell Park Memorial Institute 1640 Medium + 10% heat inactivated human serum +1% penicillin streptomyocin) on ultra-low attachment polystyrene plates (Corning) and differentiated into macrophages with the addition of 20 ng/mL of macrophage colony-stimulating factor over 5 days, with a media change on day 3. On day 5, cells were scraped and seeded onto the collagen or IFNg-collagen scaffolds at ∼67,000 cells per scaffold. Cells used as the M0 control were reseeded onto an ultra-low attachment plate. Cells were seeded on the scaffold in 10 μL of complete media for 30 min at 37°C to allow time for attachment, followed by the addition of complete media. Cell-seeded scaffolds and cell-only controls were extracted on days 3, 5, and 8 of culture in vitro; scaffolds were transferred to 1 mL of TRIzol (Invitrogen) and cells were transferred to lysis buffer (RNAqueous kit; Invitrogen), and stored at −80°C to preserve nucleic acid until RNA extraction.
“Reference” macrophage phenotype controls were generated on day 5 of macrophage culture with the addition of phenotype-specific stimuli as follows: IFNg (Peprotech)/LPS (Sigma Aldrich) (100 ng/mL each) for the M1 phenotype and IL4/IL13 (Peprotech) for the M2a phenotype (40 and 20 ng/mL, respectively). On day 7, cells were scraped and centrifuged at 400g for 7 min. Cell pellets were then stored in lysis buffer (RNAqueous kit) at −80°C after removing the media.
Samples for gene expression analysis were stored at −80°C until processing for both the cell-seeded scaffold and reference phenotype control studies. The cell-seeded scaffold study consisted of one donor with n = 3 experimental replicates per treatment group per time point and the phenotype control study included n = 3 different donors with n = 2 experimental replicates per donor.
RNA extraction and gene expression quantification using NanoString
RNA was extracted from cell-seeded scaffolds using chloroform extraction from TRIzol followed by the RNEasy kit (Qiagen) as previously described,5 and from reference controls following the manufacturer's directions for the RNAqueous kit. For gene expression analysis, 100 ng of RNA was utilized per sample using a custom-designed NanoString™ CodeSet (NanoString Technologies, Seattle, WA). The concentration of extracted RNA was quantified using Nanodrop 1000 (Thermo Scientific, Wilmington, DE); all 260/280 ratios were close to ∼2 and therefore considered pure and used for analysis. Endogenous genes for the NanoString CodeSet included a panel of 12 M1 markers and 17 M2a markers, selected from a previous RNA-seq analysis15 based on upregulation compared to M0 as well as relevance to angiogenesis or tissue repair processes (Supplementary Table S1). Nineteen additional genes were included because of their importance for angiogenesis or tissue repair. In addition to these 48 genes, the CodeSet included 6 housekeeping genes, 8 External RNA Control Consortium (ERCC) negative controls, and 6 ERCC-positive controls. Samples from each group were organized on NanoString cartridges to minimize batch effects between experimental groups by including all samples from all groups at a particular time point on one cartridge. The reference phenotype control samples were analyzed on their own cartridge. Raw counts from the cell-seeded scaffold samples and reference phenotype control samples were extracted separately from nSolver™ analysis Software 3.0 followed by separate quality control, normalization, and filtering, as described in the following paragraph.
Data preprocessing and normalization
CodeSet Content (housekeeping) genes were removed before analysis and all positive and negative controls were verified to be in the required ranges according to the manufacturer's data analysis guidelines. Gene counts quantified by NanoString were first normalized to account for technical variability (i.e., lane to lane variation due to the NanoString cartridge) using the ERCC-positive control counts, as recommended by the manufacturer. First, the geometric mean (geomean) of all the ERCC-positive control counts was calculated for each sample. Then, the average of the geomeans across all samples was calculated and divided by the geomean of each individual sample's ERCC-positive controls to yield an ERCC-positive sample-specific scaling factor. All gene counts were multiplied by their sample-specific scaling factor to make the samples comparable. Next, non-specific counts were subtracted from the raw counts to obtain a new estimate of counts above background. Background thresholds, determined for each sample by calculating the maximum value of all of the ERCC-negative control counts, were subtracted from each sample. Negative values were regarded as zero. One gene from the phenotype control dataset (BMP2) and two genes from the cell-seeded scaffold dataset (CACNB4 and CDH1) were removed since they were not expressed above the background in any sample. Before performing differential expression analysis and GSA, both datasets were log2-transformed after adding an arbitrary constant of 1 to each count (to allow transformation of counts with a value of zero). Before performing gene expression and GSA, replicates were averaged per donor for the phenotype control dataset. All preprocessing (i.e., normalization, background correction, and filtering) was performed using R (RStudio, Inc., Boston, MA).
Gene expression analysis
Principal component analysis (PCA) was performed to assess the overall effects of the experimental covariates (i.e., donor, treatment, and time) on the datasets. Before using the R PCA function, the log2-transformed gene counts were z-score normalized. Scree plots reporting the variance captured by the first five principal components and PCA plots of the first two principal components were generated using the factoextra R package. Heatmaps with dendrograms showing hierarchical clustering of genes and samples based on the Euclidean distance metric and unweighted pair-group method with arithmetic mean clustering were generated for each dataset using the heatmap.2 R function.
To explore individual gene expression differences, fold change values for each M1 and M2a gene were calculated as the averaged M1 and M2a samples compared to the averaged M0 control samples (for the reference phenotype control dataset) or as averaged collagen and IFNg-collagen samples compared to the averaged M0 control at each time point (for the cell-seeded scaffold dataset). Grouped column graphs were generated using GraphPad Prism 7 as log2(Value/M0 Control) ± standard deviation (SD). Genes were ordered by magnitude of difference in fold change compared to M0 followed by comparison to the other experimental group. As a follow-up, differential expression analysis was employed to compare the expression of M1 and M2a genes between all sample groups within each dataset.
Euclidean distance measurements
A Euclidean distance M1–M2a plot was generated using R as the average Euclidean distance between expression levels of cell-seeded scaffold samples to M1 macrophages versus the average Euclidean distance to M2a macrophages. The reference phenotype control and cell-seeded scaffold datasets were merged before normalization and background correction. No genes were removed through preprocessing. All technical replicates were averaged per donor for the phenotype control samples. In addition, all counts were log2-transformed and each gene profile was z-scored.
Gene set analysis
ROAST was performed using the log2-transformed datasets and custom gene sets through use of the mroast function from the R/Bioconductor Limma package.18 To test whether the gene sets were either upregulated or downregulated for each sample contrast, we set the gene set statistic to be calculated as the mean of the gene-wise statistics so that the set would be statistically significant only when the majority (i.e., more than 50%) of the genes were differentially expressed. We performed two implementations of ROAST for each comparison: (1) no gene weighting and (2) gene weighting using fold changes of M1 and M2a genes compared to M0. We set the number of rotations (nrot) to 10,000, which means that the lowest possible p-value is 0.00009999 (calculated as 1/[nrot +1]). Due to the inherent nature of rotational p-values to vary slightly from run to run, the median p-value was selected from 15 runs of a comparison. For multiple testing of gene sets, we set the method of p-value adjustment to the Benjamini–Hochberg procedure. All additional arguments, including “midp,” “var.prior,” and “df.prior,” were left as the default inputs. To summarize our results, we generated p-value heatmap matrices using GraphPad Prism 8 to indicate which gene sets were significantly differentially expressed between sample groups and the direction of significance.
GSVA was performed using the log2-transformed datasets and custom gene sets through use of the R/Bioconductor GSVA package.19 Since each dataset consisted of continuous counts, we set the kcdf parameter to “Gaussian,” so that a Gaussian kernel would be calculated during the nonparametric estimation of the cumulative distribution function for each gene expression profile across samples. To facilitate follow-up gene set significance testing to compare GSVA scores between sample groups, we set the max.diff parameter to true, so that the enrichment statistic would be calculated as the difference between the largest positive and negative Kolmogorov–Smirnov random walk deviations.
Statistical analysis
For differential expression analysis of M1 and M2a genes, the Shapiro–Wilk normality test and skew values were used to decide whether to perform a parametric or nonparametric test (If all three groups [M0, M1, and M2a] passed the normality test [alpha = 0.05], then an ordinary one-way analysis of variance [ANOVA] was used. Otherwise, the Kruskal–Wallis test was used). The Brown–Forsythe test was used to confirm that the SDs were not significantly different (p < 0.05). If the treatment means or medians were significantly different (p < 0.05), then a follow-up test was executed—either Tukey's multiple comparisons or Dunn's multiple comparisons for parametric or nonparametric testing, respectively. All analyses were carried out using GraphPad Prism.
Before GSA with ROAST and GSVA, the log2 distributions of each expression dataset were examined using a density plot and histogram plot with a normal curve to verify that the data were not normally distributed.
ROAST-related statistical analysis was internally performed by the ROAST algorithm using the following parameters (described in more detail in the methods section): set.statistic=“mean,” nrot = 10,000, adjust.method = “BH,” and all additional arguments, including “midp,” “var.prior,” and “df.prior,” were left as the default. To account for the slight variation between rotational p-values from run to run, a median p-value was externally calculated from 15 runs of a comparison using the two-sided directional false discovery rate reported by ROAST. All analyses were executed in R.
Before gene set significance testing of GSVA scores, the Shapiro–Wilk normality test and skew values were used to decide whether to use a parametric or nonparametric test. The Brown–Forsythe test was used to confirm that the SDs were not significantly different (p < 0.05). Scores were compared across each gene set using either an ordinary one-way ANOVA or Kruskal–Wallis test, as appropriate, followed by Tukey's multiple comparisons (if the treatment means were significantly different, p < 0.05) or Dunn's multiple comparisons (if the treatment medians were significantly different, p < 0.05). Analysis was conducted with GraphPad Prism.
Results
Interferon-gamma release
As expected, most of the adsorbed IFNg was released within the first time point of the release study (2 h), although detectable levels (50–100 pg/mL) were released from 3 of 4 replicates at the 8-day time point and from one of the replicates at the 14-day time point (Supplementary Fig. S1).
Gene expression analysis
PCA was used to determine the leading contributors to the variability within the data. For the reference phenotype controls, 69.4% of the total variability within the data was captured by the first two principal components, and most of the variability could be attributed to phenotype effects, while a smaller amount was due to donor variability (Supplementary Fig. S2A, B). For the cell-seeded scaffolds, the first two components explained 56% of the total variability within the data and revealed that treatment was a leading effect on the variability with a small, if any, effect of time (Supplementary Fig. S2C, D).
A heatmap and dendrogram of the reference phenotype controls showed clustering by phenotype and confirmed that M1 genes tend to be more expressed in M1 samples than in M0 and M2a, whereas M2a genes tend to be more expressed in M2a samples than in M1 and M0 (Fig. 2A). While the heatmap and dendrogram of the cell-seeded scaffolds revealed clustering of most M0 samples, there were no clear trends of differential expression of M1 and M2a genes between the collagen and the IFNg-collagen groups (Fig. 2B). As expected, some genes were upregulated, while others were downregulated in both groups.
FIG. 2.
Heatmap of (A) phenotype control samples (n = 3 donors, each with n = 2 experimental replicates) and (B) cell-seeded scaffold macrophage samples (n = 1 donor, n = 3 experimental replicates per treatment per time point) with hierarchical clustering of genes and samples using the Euclidean distance metric and unweighted pair-group method with arithmetic mean clustering. Row-wise z-scores were used to normalize each gene across the sample space. The left dendrogram indicates clustering of genes and the top dendrogram indicates clustering of samples. Dendrograms are color coded according to gene set and sample group. Genes present within the dataset that are not part of either the M1 or M2a gene sets are denoted as “Other genes.” Color images are available online.
When examining each gene individually, we found that most genes included in the M1 set had fold changes >1.5 in the M1 samples compared to the M0 control, although the magnitude of expression of these genes widely varied, as did the extent of differences compared to the M2a control. CLEC4E and TNF were leading M1 markers, with the greatest fold change versus M0 between M1 and M2a macrophages in opposing directions (p < 0.05; Fig. 3A and Supplementary Table S2). Overall, only 5 M1 genes (CLEC4E, TNF, CCL1, CFB, and IRF1) and 11 M2a genes were significantly differentially expressed between the M1 and M2a groups, probably because of the low statistical power in this study. Nonetheless, because these genes have been previously shown as markers of the M1 and M2a phenotypes15 and because gene set analyses generally do not require genes to be differentially expressed, we decided to keep all genes within their respective gene sets for downstream analysis. However, because this analysis yielded the identification of particular genes with greater value for identifying each phenotype, we designed a method of GSA that incorporated these effects, which was explored with gene weighting with the ROAST algorithm.
FIG. 3.
Gene expression of M1 and M2a macrophages relative to M0 control macrophages. Fold changes of (A) M1 genes and (B) M2a genes calculated as log2(Value/M0 control). Bars represent the mean ± SD (n = 3 donors or biological replicates). A dotted line indicates a fold change of 1.5 (∼0.58 in log2). Genes are ordered from left to right by fold change compared to M0 for genes in which the M1 and M2a groups are regulated in opposing directions, followed by greatest fold change between M1 and M2a. Statistical analysis was performed on log-transformed data using a one-way ANOVA test followed by Tukey's multiple comparisons or the Kruskal–Wallis test followed by Dunn's multiple comparisons (*p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001). #A significance relative to the M0 control. ANOVA, analysis of variance; SD, standard deviation. Color images are available online.
In response to collagen scaffolds, macrophages downregulated some M1 genes and upregulated others relative to M0 on day 3 (Fig. 4A–C; Supplementary Tables S3). Almost all M1 genes were downregulated in the collagen group on days 5 and 8, although not all results were significant. In comparison, IFNg-collagen scaffolds promoted a mix of upregulated and downregulated M1 genes relative to M0 across all time points. Examination of the M2a genes revealed that the collagen control scaffolds promoted a mix of upregulated and downregulated M2a genes relative to M0 on day 3, with the number of downregulated genes outnumbering the upregulated genes on days 5 and 8 (Fig. 4D–F). Collectively, as expected, these results suggest that macrophages cultured with either model biomaterial were not distinctly M1 or M2a, since there was a mix of upregulated and downregulated expression of markers of each phenotype.
FIG. 4.
Gene expression of primary human macrophages in response to control (collagen) and adsorbed (IFNg-collagen) scaffolds over 3, 5, and 8 days. Fold changes of (A–C) M1 genes and (D–F) M2a genes calculated as log2(value/M0 control). Bars represent the mean ± SD (n = 3 experimental replicates). A dotted line indicates a fold change of 1.5 (∼0.58 in log2). Genes are ordered from left to right by fold change compared to M0 for genes in which the collagen and IFNg-collagen groups are regulated in opposing directions followed by greatest fold change between collagen and IFNg-collagen. Genes without bars indicate counts that were not expressed above the background for either sample group. Statistical analysis was performed on log-transformed data using a one-way ANOVA test followed by Tukey's multiple comparisons or the Kruskal–Wallis test, followed by Dunn's multiple comparisons (*p ≤ 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001). #A significance relative to the M0 control. Color images are available online.
Euclidean distance measurements
Next, a Euclidean distance M1–M2a plot was generated to directly compare the gene expression signatures of macrophages cultured on cell-seeded scaffolds to the reference phenotype controls (Fig. 5). The M1–M2a plot of all 48 genes showed modest, if any, separation between treatment groups, with macrophages seeded on IFNg-collagen slightly more similar (shorter distance) to M1 macrophages and slightly less similar (larger distance) to M2a macrophages compared to macrophages seeded on collagen.
FIG. 5.
A Euclidean distance M1–M2a plot as the averaged Euclidean distance of cell-seeded scaffold groups to the M1 group versus the averaged Euclidean distance to the M2a group was used to compare macrophages cultured on cell-seeded scaffolds to the reference phenotype controls. +The centroid for a group of samples across all time points. Color images are available online.
Gene set analysis
We next utilized ROAST to investigate differential expression of sets of M1 and M2a genes between the different sample groups. First, we applied ROAST and the custom gene sets to the reference phenotype control samples to validate the method and the choice of genes. ROAST confirmed that M1 samples significantly upregulated the M1 gene set and M2a samples upregulated the M2a gene set compared to M0 (p < 0.01, Fig. 6A and Supplementary Table S4). Evaluation of the cell-seeded scaffold samples with ROAST revealed that collagen scaffolds significantly downregulated the M1 gene set on day 5 compared to M0 controls (Fig. 6B and Supplementary Table S4). In contrast, IFNg-collagen scaffolds significantly upregulated the M1 gene set on days 5 and 8 compared to the collagen control and significantly downregulated the M2a gene set on day 5 compared to M0 controls. Interestingly, the M0, collagen, and IFNg-collagen scaffold macrophages showed no significant differences in the enrichment of the M2a gene set over time.
FIG. 6.
ROAST without gene weights of (A) reference phenotype controls and (B) cell-seeded scaffold macrophages. Samples were compared in the order of row variable to column variable. An X placed through a cell indicates a comparison that was not made. Cells are color coded according to the direction of significance, either upregulation (red) or downregulation (blue) of a gene set. n = 3 biological replicates for the phenotype control macrophages and n = 3 experimental replicates for the cell-seeded scaffold macrophages. Color images are available online.
To account for the fact that some genes appeared to be better markers of the M1 and M2a phenotypes than others, we completed an additional implementation of ROAST with gene weights based on the fold changes of M1 and M2a genes in the M1 and M2a reference phenotype samples, respectively, relative to the M0 controls. Using this method, we found an increase in the significance levels for the upregulation of the M2a gene set in the M2a sample and for the downregulation of the M1 gene set in collagen scaffolds compared to M0 on day 5 (Fig. 7A, B). In addition, this method indicated that macrophages cultured with IFNg-collagen scaffolds upregulated the M1 gene set compared to both collagen and M0 samples at each time point and that the expression of the M1 gene set in collagen scaffold macrophages decreased over time from day 3 to 5. In addition, the p-value for the downregulation of the M2a gene set in IFNg-collagen samples compared to M0 decreased on day 5. Overall, this method demonstrated that gene weighting can be used to emphasize differences in the regulation of gene sets between complex biomaterials.
FIG. 7.
ROAST of (A) reference phenotype controls and (B) cell-seeded scaffold macrophages with gene weights incorporated into the gene set testing procedure using fold changes of M1/M0 (for genes in the M1 set) and M2a/M0 (for genes in the M2a set). Samples were compared in the order of row variable to column variable. An X placed through a cell indicates a comparison that was not made. Cells are color coded according to the direction of significance, either upregulation (red) or downregulation (blue) of a gene set. n = 3 biological replicates for the phenotype control macrophages and n = 3 experimental replicates for the cell-seeded scaffold macrophages. Color images are available online.
Taken together, the different implementations of ROAST suggest that collagen scaffolds inhibited the M1 macrophage phenotype at an intermediate time point (i.e., day 5) without inducing an M2a phenotype at any time point, whereas IFNg-collagen scaffolds generally promoted an M1-like phenotype, especially at an intermediate time point (i.e., day 5).
To further characterize the cell-seeded scaffolds, we utilized GSVA to test if the genes in each set were expressed more highly than genes not in the set. As expected, the reference phenotype controls showed that M1 samples were significantly more enriched in M1 genes and M2a samples were significantly more enriched in M2a genes compared to the other phenotype groups (Fig. 8A). Upon evaluation of the cell-seeded scaffolds with GSVA, macrophages cultured on collagen scaffolds showed less enrichment of the M1 gene set than M0 controls on days 5 and 8, although these results were only statistically significant on day 8 (Fig. 8B–D). In contrast, macrophages seeded on the IFNg-collagen scaffolds were significantly more enriched in the M1 gene set than the collagen and M0 groups on day 3, and significantly more enriched in the M1 gene set than collagen scaffolds on days 5 and 8. No significant differences in enrichment of the M2a gene set were found between any groups. Collectively, these results suggest that in response to collagen scaffolds, macrophages generally inhibited an M1 phenotype without taking on a more M2a-like phenotype. In contrast, the adsorption of IFNg to collagen scaffolds increased characteristics of the M1 phenotype at each time point compared to collagen scaffolds without IFNg.
FIG. 8.
GSVA ESs of (A) reference phenotype controls and (B–D) cell-seeded scaffold macrophages. Data represented as mean ± SD (n = 3 biological replicates for the reference phenotype controls and n = 3 experimental replicates for the cell-seeded scaffolds). One-way ANOVA followed by Tukey's multiple comparisons test (*p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001) was used to compare mean GSVA scores between samples across each set. GSVA ES, GSVA enrichment score. Color images are available online.
To facilitate the interpretability of GSVA enrichment scores and highlight sample trends, we created GSVA M1–M2a score plots. Figure 9A emphasizes that the M1 gene set tended to be more enriched in M1 macrophages compared to M0 controls and the M2a gene set tended to be more enriched in M2a samples compared M0 controls. Figure 9B shows that collagen scaffolds tended to be promote greater enrichment of the M2a gene set than M0 controls, although these results were not significant (as shown in Fig. 8). In addition, IFNg-collagen scaffolds tended to promote greater enrichment of M1 genes compared to M0 controls, although these results were not significant for days 5 and 8. Averaging all technical replicates from the cell-seeded scaffold groups and tracking their change in score over time revealed that M0 macrophages increased in M2a score from day 3 to 5, whereas all other samples showed no statistically significant changes in the regulation of M1 or M2a genes over time (Fig. 9C–E).
FIG. 9.
GSVA ES plot of (A) reference phenotype controls and (B–E) cell-seeded scaffold macrophages. (A, B) Ellipses drawn to highlight sample clustering. (C) Arrows indicate changes over time. Data represent all points for (A, B) reference phenotype controls and cell-seeded scaffolds or averages of technical replicates for (C–E) cell-seeded scaffolds as mean ± SD (n = 3 experimental replicates). One-way ANOVA followed by Tukey's multiple comparisons test (*p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001) was used to compare mean GSVA scores between samples across each gene set and different time points. Color images are available online.
Finally, to gain a better understanding of the activity of genes outside the classic M1 and M2a phenotypes (i.e., “Other immune/angiogenesis genes”), we examined the enrichment of these genes in each sample group. M1 and M2a macrophages were less enriched in these genes than M0 controls (Fig. 10A). Collagen scaffolds were significantly more enriched in these genes than IFNg-collagen scaffolds on day 5 (Fig. 10C). Notably, regulation of these genes did not significantly differ between any scaffold groups on days 3 and 8.
FIG. 10.
GSVA ESs of (A) reference phenotype controls and (B–D) cell-seeded scaffold macrophages. Data represented as mean ± SD (n = 3 biological replicates for the reference phenotype controls and n = 3 experimental replicates for the cell-seeded scaffolds). One-way ANOVA followed by Tukey's multiple comparisons test or the Kruskal–Wallis test followed by Dunn's multiple comparisons (*p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001) was used to compare mean GSVA scores between samples across each gene set. Color images are available online.
Discussion
When macrophages respond to biomaterials, they take on complex phenotypes that do not fit into typical M1 or M2 classifications. There is a need in biomaterials and tissue engineering research for a standard, objective, and thorough method to characterize complex macrophage phenotypes in a way that is tailorable to identification of specific processes or phenotypes of interest. Current methods for charactering macrophage phenotype in response to immunomodulatory biomaterials are limited in their ability to determine similarities and differences from reference phenotypes prepared with defined biochemical stimuli. While methods that rely on large, high-dimensional datasets such as RNAseq analysis do allow for these comparisons, these techniques are not feasible or practical for many applications in immunomodulatory biomaterial design. In this study, we present a method to facilitate analysis of the behavior of macrophages by comparison to “reference” phenotypes prepared in vitro with defined biochemical stimuli. By using ROAST with gene weighting to emphasize genes that are better markers or more important than others, followed by GSVA to allow analysis across multiple groups and reference phenotypes, we showed that collagen scaffolds inhibit the classic M1 phenotype without promoting a classic M2a phenotype. In contrast, IFNg-adsorbed collagen scaffolds promote the classic M1 phenotype and downregulate the M2a phenotype. With extension to more phenotypes, this method could be useful for comparing macrophages of unknown or hybrid phenotypes to reference phenotype controls that are known to exhibit functions relevant to the process of interest (e.g., wound healing, angiogenesis, phagocytosis, and osteogenesis).
To help interpret the mixed expression of increasing/decreasing markers of both M1 and M2a phenotypes, one solution has been to take the ratio of M1 markers to M2 markers, so that higher ratios represent increased proinflammatory (M1 like) behavior with respect to M2a behavior.12,15,24–26 However, these approaches imply that a linear spectrum of only two phenotypes exists. In addition, the combination of multiple markers into one ratio may falsely represent M1-like or M2-like behavior in the case that one or two M1 or M2a markers are much more highly expressed than all other markers for that phenotype.
Each of the ROAST and GSVA methods has their own advantages and disadvantages, making them ideal for use in combination. In particular, ROAST has the distinct advantage of providing results that are not affected by the total number of genes in the set, meaning the results would be the same regardless of the use of a small panel of genes or the whole transcriptome, and also allows for gene weighting. In this study, we used fold changes to provide an example of how gene weighting works, but there are many other methods for weighting, if for example, certain genes are known to be more important to a particular biological process than others. A notable disadvantage of ROAST is that each comparison must be made individually, making extension to multiple phenotypes quite cumbersome and potentially problematic as the potential for type I errors increases. GSVA, on the other hand, outputs scores that are roughly normally distributed so that they can be analyzed with standard statistical tests, making it ideal for comparison across many groups and phenotypes. However, a disadvantage is that the scores are strongly affected by genes outside the set, making comparisons between different studies or even between phenotypes potentially problematic (if, for example, there are more genes outside one set compared to another). For these reasons, we recommend that GSVA be used as a follow-up method to ROAST. Both methods could be extended to compare to multiple reference phenotypes because F-statistics can be utilized in place of moderated t-statistics in ROAST18 and GSVA allows for any type of follow-up testing. A disadvantage of using this strategy with a small panel of genes is that the results are only as strong as the choice of genes, and therefore, genes must be carefully selected. In other words, if we had selected different M1 and M2a markers, the results might have changed. A final consideration, specific to using GSVA, is that as the total number of samples in the dataset increases, GSVA gives greater statistical power.19
While multiplex gene expression analysis is useful for characterizing the response of macrophages to biomaterials at early stages of the biomaterial design process, ideally, these analyses should be followed up with protein-level confirmation as well as functional analysis to link the results to functional changes in macrophage phenotype. It is also important to complete the analyses with macrophages isolated from multiple donors, since donor-to-donor variability can influence the response of macrophages to biomaterials.12 Indeed, the inflammatory response to biomaterials can differ depending on the age27,28 or disease state of the patient (or animal model),29 so for many applications, it may be important to characterize the effects of the biomaterial on macrophages derived from patients for which the biomaterial is intended. The proposed methodology should be useful for assessing such donor-level effects like presence of comorbidities. Although this study focused on elucidating the effect of crosslinked collagen scaffolds with and without adsorbed IFNg and to compare macrophages cultured on complex biomaterials to the most widely used M1 and M2a phenotypes (prepared using LPS ± IFNg and IL4/IL13, respectively), it did not identify the effects of the interactions between the response to IFNg and to collagen because an IFNg-only control was not included. Future studies may benefit from including this control (i.e., M0 macrophage stimulated with IFNg, but without LPS) to compare the effects of the biomaterial surface (plastic vs. collagen) in the presence of IFNg. Another limitation of this study is that relatively late time points (days 3–7) were analyzed. While this analysis did produce a description of a more stabilized response to the biomaterials, future studies should contain a more complete analysis by including earlier time points (e.g., 1 day).
Due to the critical roles of macrophages throughout the tissue repair process, there is growing interest in incorporating macrophage-modulating factors into biomaterial designs to encourage biomaterial integration and constructive tissue repair. As chronic M1 or M2a behavior is common in response to implanted biomaterials, a promising strategy is to temporally control macrophage behavior to promote the natural M1-to-M2a phenotypic transition and prevent this chronic behavior. The crosslinked collagen scaffolds used in this study are naturally anti-inflammatory, which could be useful for encouraging a constructive response to implanted biomaterials, whereas the crosslinked IFNg-coated collagen scaffolds would not be useful for this purpose since they promoted a sustained M1 response, which has been shown to be detrimental for biomaterial integration.1,2 However, the IFNg-collagen scaffolds might be useful for applications requiring sustained inflammation, such as reducing the progression of tumors or limiting infections. It is also important to note that the effects on macrophage behavior reported in this study are likely to differ in vivo, where the microenvironment provides many more signals, although the methodology presented in this study would remain useful for characterizing macrophages extracted from the in vivo environment.
In summary, this work demonstrates a powerful methodology for characterizing the complex responses of macrophages to immunomodulatory biomaterials using standard and well-characterized macrophage phenotypes as a reference.
Supplementary Material
Acknowledgments
The authors are grateful for helpful discussions with Ms. Jessica Eager, Dr. Will Dampier, and Dr. Ahmet Sacan.
Disclosure Statement
No competing financial interests exist.
Funding Information
This work was supported by the National Institutes of Health (R01 HL130037 to K.L.S.).
Supplementary Material
References
- 1. Badylak S.F., Valentin J.E., Ravindra A.K., McCabe G.P., and Stewart-Akers A.M.. Macrophage phenotype as a determinant of biologic scaffold remodeling. Tissue Eng Part A 14, 1835, 2008 [DOI] [PubMed] [Google Scholar]
- 2. Brown B.N., Valentin J.E., Stewart-Akers A.M., McCabe G.P., and Badylak S.F.. Macrophage phenotype and remodeling outcomes in response to biologic scaffolds with and without a cellular component. Biomaterials 30, 1482, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Spiller K.L., Anfang R.R., Spiller K.J., et al. The role of macrophage phenotype in vascularization of tissue engineering scaffolds. Biomaterials 35, 4477, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Spiller K.L., and Koh T.J.. Macrophage-based therapeutic strategies in regenerative medicine. Adv Drug Deliv Rev 122, 74, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Witherel C.E., Graney P.L., and Spiller K.L.. In vitro model of macrophage-biomaterial interactions. Methods Mol Biol 1758, 161, 2018 [DOI] [PubMed] [Google Scholar]
- 6. O'Brien E.M., Risser G.E., and Spiller K.L.. Sequential drug delivery to modulate macrophage behavior and enhance implant integration. Adv Drug Deliv Rev 149–150, 85, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Garash R., Bajpai A., Marcinkiewicz B.M., and Spiller K.L.. Drug delivery strategies to control macrophages for tissue repair and regeneration. Exp Biol Med (Maywood) 241, 1054, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Julier Z., Park A.J., Briquez P.S., and Martino M.M.. Promoting tissue regeneration by modulating the immune system. Acta Biomater 53, 13, 2017 [DOI] [PubMed] [Google Scholar]
- 9. Sarhane K.A., Ibrahim Z., Martin R., et al. Macroporous nanofiber wraps promote axonal regeneration and functional recovery in nerve repair by limiting fibrosis. Acta Biomater 88, 332, 2019 [DOI] [PubMed] [Google Scholar]
- 10. Li T., Peng M., Yang Z., et al. 3D-printed IFN-gamma-loading calcium silicate-beta-tricalcium phosphate scaffold sequentially activates M1 and M2 polarization of macrophages to promote vascularization of tissue engineering bone. Acta Biomater 71, 96, 2018 [DOI] [PubMed] [Google Scholar]
- 11. Tanaka R., Saito Y., Fujiwara Y., Jo J.I., and Tabata Y.. Preparation of fibrin hydrogels to promote the recruitment of anti-inflammatory macrophages. Acta Biomater 89, 152, 2019 [DOI] [PubMed] [Google Scholar]
- 12. Witherel C.E., Graney P.L., Freytes D.O., Weingarten M.S., and Spiller K.L.. Response of human macrophages to wound matrices in vitro. Wound Repair Regen 24, 514, 2016 [DOI] [PubMed] [Google Scholar]
- 13. Spiller K.L., Nassiri S., Witherel C.E., et al. Sequential delivery of immunomodulatory cytokines to facilitate the M1-to-M2 transition of macrophages and enhance vascularization of bone scaffolds. Biomaterials 37, 194, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Graney P.L., Roohani-Esfahani S.I., Zreiqat H., and Spiller K.L.. In vitro response of macrophages to ceramic scaffolds used for bone regeneration. J R Soc Interface 13, 20160346, 2016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Lurier E.B., Dalton D., Dampier W., et al. Transcriptome analysis of IL-10-stimulated (M2c) macrophages by next-generation sequencing. Immunobiology 222, 847, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Xue J., Schmidt S.V., Sander J., et al. Transcriptome-based network analysis reveals a spectrum model of human macrophage activation. Immunity 40, 274, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Langfelder P., and Horvath S.. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Wu D., Lim E., Vaillant F., Asselin-Labat M.L., Visvader J.E., and Smyth G.K.. ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26, 2176, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Hanzelmann S., Castelo R., and Guinney J.. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7, 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Nam D., and Kim S.Y.. Gene-set approach for expression pattern analysis. Brief Bioinform 9, 189, 2008 [DOI] [PubMed] [Google Scholar]
- 21. Khatri P., Sirota M., and Butte A.J.. Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol 8, e1002375, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Maciejewski H. Gene set analysis methods: statistical models and methodological differences. Brief Bioinform 15, 504, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Goeman J.J., and Buhlmann P.. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980, 2007 [DOI] [PubMed] [Google Scholar]
- 24. Brown B.N., Londono R., Tottey S., et al. Macrophage phenotype as a predictor of constructive remodeling following the implantation of biologically derived surgical mesh materials. Acta Biomater 8, 978, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Herwig M.C., Bergstrom C., Wells J.R., Holler T., and Grossniklaus H.E.. M2/M1 ratio of tumor associated macrophages and PPAR-gamma expression in uveal melanomas with class 1 and class 2 molecular profiles. Exp Eye Res 107, 52, 2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Zhang M., He Y., Sun X., et al. A high M1/M2 ratio of tumor-associated macrophages is associated with extended survival in ovarian cancer patients. J Ovarian Res 7, 19, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. LoPresti S.T., and Brown B.N.. Effect of source animal age upon macrophage response to extracellular matrix biomaterials. J Immunol Regen Med 1, 57, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Alhamdi J.R., Peng T., Al-Naggar I.M., Hawley K.L., Spiller K.L., and Kuhn L.T.. Controlled M1-to-M2 transition of aged macrophages by calcium phosphate coatings. Biomaterials 196, 90, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Oliva N., Carcole M., Beckerman M., et al. Regulation of dendrimer/dextran material performance by altered tissue microenvironment in inflammation and neoplasia. Sci Transl Med 7, 272ra11, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.










