Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 May 25.
Published in final edited form as: Cell Syst. 2016 May 19;2(5):323–334. doi: 10.1016/j.cels.2016.04.011

Simultaneous Pathway Activity Inference and Gene Expression Analysis Using RNA Sequencing

Daniel J O'Connell 1,#, Raivo Kolde 2,#, Matthew Sooknah 1, Daniel B Graham 1, Thomas B Sundberg 3, Isabel Latorre 1, Tarjei S Mikkelsen 4, Ramnik J Xavier 1,2,3,5,*
PMCID: PMC5032147  NIHMSID: NIHMS780548  PMID: 27211859

Summary

Reporter gene assays are a venerable tool for studying signaling pathways, but they lack the throughput and complexity necessary to contribute to a systems-level understanding of endogenous signaling networks. Here we present a parallel reporter assay, Transcription Factor activity sequencing (TF-seq), built on synthetic DNA enhancer elements, which enables parallel measurements in primary cells of the transcriptome and transcription factor activity from more than 40 signaling pathways. Using TF-seq in Myd88−/− macrophages, we captured dynamic pathway activity changes underpinning the global transcriptional changes of the innate immune response. We also applied TF-seq to investigate small-molecule mechanisms of action and find a role for NFκB activation and coordination of the STAT1 response in the macrophage response to the anti-inflammatory natural product halofuginone. Simultaneous TF-seq and global gene expression profiling presents an integrative approach for gaining mechanistic insight into pathway activity and transcriptional changes that result from genetic and small molecule perturbations.

Graphical abstract

graphic file with name nihms-780548-f0001.jpg

Introduction

Cellular signaling networks integrate external and internal information through biochemical interactions that ultimately regulate transcriptional responses. Reporter assays provide a quantitative assessment of signal transduction pathway activation by inferring the activity of pathway-specific transcription factors in terms of protein activity (Gorman et al., 1982; Bronstein et al., 1994). However, this approach is not amenable to the analysis of multiple signaling pathways in a single population of cells due to a paucity of orthogonal protein activity readouts (Bellis et al., 2011; Padmashali et al., 2014). In contrast, global gene expression profiling offers a more information-rich, unbiased approach to identify key mediators of signal transduction and gene regulation (Tian et al., 2005). Yet, it is not straightforward to reliably reconstruct upstream signaling events by means of gene expression data alone. The use of RNA-seq in gene reporter assays has been successfully described in enhancer studies (Arnold et al., 2013; Melnikov et al., 2012; Patwardhan et al., 2012), analysis of mRNA structure and function (Holmqvist et al., 2013; Zhao et al., 2014), examinations of chromosomal position effects (Akhtar et al., 2013), and interpretation of noncoding genetic variation (Vockley et al., 2015). However, a high throughput sequencing (HTS) approach to systematically isolate and assay the activity of multiple distinct transcriptional regulators in parallel has not yet emerged.

To address these limitations, we developed a high-throughput sequencing assay termed TF-seq, and demonstrate its application to canonical transcription factors from more than 40 widely investigated signaling pathways. Because TF-seq is based on RNA-seq technology, we were able to simultaneously integrate global gene expression and signaling pathway activity measurements. TF-seq produced inferences of pathway activation in primary macrophages after stimulation with a diverse panel of microbial stimuli; these inferences could not be recapitulated by RNA-seq or ChIP-seq data alone. We then used TF-seq to investigate the mechanism of action of the anti-inflammatory natural product halofuginone in primary macrophages, identifying an unexpected activation of NFκB and suppression of STAT1.

Results

TF-seq Quantitative Methodology and Experimental Design

TF-seq is based on 58 lentiviral reporter vectors that are distinguished only by the unique DNA response element (RE) cloned in front of the luciferase (Luc2P) open reading frame and its corresponding RE sequence-tag (RE-tag) in the 3′ UTR (Figure 1A and Table 1). TF-seq was created in a lentiviral construct in order to access the physiologically relevant gene regulatory networks of primary cells through transduction. The 58 unique REs were curated from peer-reviewed literature and selected to represent more than 40 pathways under active investigation (Table S1). Most of the REs are synthetic DNA sequences of concatenated transcription factor binding sites (TFBS), with the intention of isolating pathway-specific transcription factor activity from the complexities of endogenous gene regulation.

Figure 1. Overview of transcription factor activity sequencing (TF-Seq).

Figure 1

(A) TF-Seq's 58 lentiviral reporter vectors are distinguished by the DNA response element (RE) driving Luc2P transcription, the RE sequence tags (RE-tag) and their associated unique molecular identifiers (UMIs). Emerald-GFP (EmGFP) is downstream of the reporter gene with an invariant SV40 promoter and used to monitor the rate of lentiviral transduction.

(B) 58 unique REs were created to represent more than 40 of the most widely studied signaling pathways, with a total degenerate vector complexity of 191,724. n = 3 index-tagged PCR amplicons of the pooled TF-seq vector library, data points represent mean ± s.d.

(C) To perform TF-seq, an equimolar ratio of all 58 TF-seq vectors are first transfected into HEK293T cells. After 48 h, the conditioned media containing a pool of lentiviral particles representing the TF-seq reporter library is collected and transduced into a cell type of interest. After 72 h, the heterogeneous population of cells transduced with TF-seq is collected and replated in 96-well microtiter plates for stimulation and assay collection.

(D) TF-seq libraries are prepared with gene-specific reverse transcriptase priming of the Luc2P reporter gene. Additional 5′ sequence is included in the TF-seq reverse transcriptase primers to add well-tags and a second UMI to approximate single molecule mRNA counting and pool samples from a 96-well microtiter plate for Illumina adapter PCR and multiplexed sequencing.

(E) Multiplexing pathway activity inferences using TF-seq correlates with luciferase pathway activity inferences r = 0.68 (P < 4.2 × 10−7 based on Spearman correlation test). n = 4 wells for luciferase measurements and n = 4 for the TF-seq measurements.

See also Figure S1 and Tables S1 and S2.

Table 1.

TF-seq pathway reporters and transcription factors.

Pathway Transcription Factors
AHR (aryl hydrocarbon receptor) AHR,ARNT
Amino Acid Deprivation (AARE) ATF4
AP-1 (MAPK) FOS,JUN
Apoptosis TRP53
ARE (antioxidant response element) NFE2L2 (NRF2)
c-MAF MAF
C-REL (alternative NFkB) REL
C/EBP α CEBPA
C/EBP β CEBPB
Circadian cycle BMAL1,CLOCK
constitutive enhancer NFYA
constitutive enhancer SP1
cyclic AMP CREB1
Early Growth Response EGR1
ER Stress Element (ERSE) XBP1
Farnesoid X Receptor NR1H4 (FXR)
GATA3 GATA3
Glucocorticoid Receptor NR3C1 (GR)
Hedgehog GLI1
Hepatocyte Nuclear Factor 4 HNF4a
Hippo TEAD1
Hypoxia HIF1A
Interferon beta AP-1, IRF3,IRF7, NFkB
Insulin (PI3K/AKT) FOXO1
IRF1 IRF1
IRF3 IRF3
IRF3/IRF7 IRF3,IRF7
Liver X Receptor NR1H3 (LXR)
Lysosomal biogenesis TFEB
MAPK ELK1
NFkB NFKB1,RELA (p50,p65)
Notch NICD,CBF1 (CSL,RBPJ)
PKC/Ca++ NFATC1
PPARg PPARg,RXRA
RELB (alternative NFkB) RELB
Retinoic Acid (RARE) RARA
Retinoid-related Orphan Receptors (RORs) RORC
RUNX RUNX1
Serum Response Element SRF
SIE (sis-inducible element) STAT3
STAT4 STAT4
STAT5 STAT5a,STAT5b
STAT6 STAT6
Sterol SREBF1
TGF β SMAD2, SMAD3, SMAD4
Type I Interferon STAT1,STAT2
Type II Interferon STAT1,STAT1
UPRE (Unfolded Protein Response Element) ATF6
Vitamin D VDR
Wnt LEF1, TCF7, TCF7L1, TCF7L2
Minimal Promoter Only single TATA promega pGL4 series

TF-seq's RE-tags are used to track the transcriptional output associated with each of the 58 DNA REs. Because the lentiviral transduction delivery approach used by TF-seq involves integration into the genome, which is sensitive to the site-specific transcriptional effects of endogenous regulatory elements and chromatin context, we included unique molecular identifiers (UMIs) (Kivioja et al., 2012) adjacent to each RE-tag. The number of unique sequence-tagged REs associated with each of the 58 TF-seq vectors fell between 2,000 and 6,000, with a total complexity of 191,724, as measured by HTS (Figure 1B). This complexity is sufficient to quantify the number of unique TF-seq genomic integrations in a 96-well microtiter plate with ≤ 100,000 cells.

TF-seq is performed by transducing a cell type of interest with a complex pool of lentiviral particles distributed among the 58 TF-seq reporter plasmids (Figure 1C). After a defined period of cell stimulation, TF-seq-transduced cells are lysed directly in their 96-well microtiter plates and stored at −80°C until RNA-seq library preparation can begin. To mitigate PCR amplification bias and allow sample pooling immediately after the reverse transcriptase reaction, we designed 96 well-tagged reverse transcriptase primers specific to Luc2P with an additional UMI to approximate single RNA molecule counting (Soumillon et al., 2014) (Figures 1D, S1A-H, and Table S2).

TF-seq is able to multiplex pathway reporter assays because it uses sequence-tag counting by RNA-seq to infer pathway activation instead of protein activity readout such as bioluminescence. Therefore, we sought to first benchmark TF-seq's performance with the well-established quantitative measurements of pathway activity changes measured by luciferase. Using the mouse macrophage cell line RAW 264.7, we calculated the log2 fold changes for a 12-plex subset of the 58 reporter vectors using both TF-seq and their corresponding single luciferase gene reporters after stimulation with small molecules or pathogen associated molecular pattern molecules (PAMPs). The correlation between luciferase protein activity and TF-seq was r = 0.68 (P < 4.2 × 10−7 based on Spearman correlation test) (Figure 1E). These data suggest that TF-seq pathway activity measurements.

Simultaneous TF-seq and Global Gene Expression Analysis

The simultaneous preparation of TF-seq and global RNA-seq gene expression profiling libraries can be accomplished within the same cell lysates by including a poly-dT(25) primer modified with TF-seq's additional 5′ sequence structure (Figure 1D): well-tag, UMI, and partial Illumina P5 adapter (Figure S1I-L). We created 3′ digital gene expression (3′ DGE) libraries (3′ DGE) by PCR enrichment of the 3′ end of the polydT-primed transcriptomic libraries (Soumillon et al., 2014). This approach allowed us to preserve the well sequence tags imparted during the first-strand cDNA synthesis reaction and retain the reagent sharing and liquid handling convenience of early sample pooling. In this report, we used 96 unique sequence-tagged primers targeted to the Luc2P transcript and 384 unique sequence tagged poly-dT primers (4-fold degeneracy per well = 96 transcriptomic well-tags) (Table S2), which allowed us to pool 96-well microtiter plates after reverse transcription and then multiplex sequence the TF-seq amplicon and the 3′ DGE libraries separately.

Using TF-seq to Infer Pathway Activity Dynamics

We applied TF-seq to the well-characterized primary cell system of mouse bone marrow derived macrophages (BMDMs), in order to rediscover and systematically characterize the dynamic pathway activity changes of the innate immune response. We included BMDMs derived from Myd88−/− knockout model to test TF-seq's specificity to detect pathway activity changes through highly multiplexed sequencing. In mice, MyD88 is a critical adapter protein that mediates signaling through most of the Toll-like receptors (TLRs) and some cytokine receptors (Figure 2A). To engage innate immune receptors with complete dependence on Myd88, partial dependence on Myd88, and those independent of the Myd88 adapter protein, we chose 12 diverse PAMPs and performed TF-seq at 8 time points in a short 4-hour time-series.

Figure 2. TF-seq rediscovers Myd88-dependent innate immune signaling.

Figure 2

(A) A graphical abstract of the innate immune receptors and their dependence on different signaling adaptor proteins.

(B) The temporal TF-seq activity patterns in bone marrow-derived macrophages after stimulation with LPS. Five manually selected activity profiles highlight the major patterns of responses. Each line represents the smoothened activity profile of distinct transcription factors inferred by TF-seq. Solid lines represent wildtype and dashed lines represent Myd88−/−.

(C) A temporal heatmap that summarizes the TF-seq activity patterns for all PAMP stimulations. Each colored bar represents a condition where TF-seq was significantly different from non-treated samples (FWER < 0.05). The color corresponds to log2 fold change and the number shows ANOVA P value for the full time series. The bar graph immediately below the heatmap shows the number of differentially expressed genes detected for the entire time series using the same ANOVA test. Bone marrow from two Myd88−/− and two wildtype littermates were pooled during BMDM differentiation. n = 2 wells for each stimulus and time point.

See also Figure S2 and Table S3.

First, we concentrated on the signaling pathway dynamics measured by TF-seq after stimulation with bacterial lipopolysaccharide (LPS). Cells sense LPS with the TLR4 receptor (Figure 2A) and the response exhibits only partial dependence on the MyD88 signaling adapter protein. Of the 58 TF-seq vectors, 14 reporters in wild-type BMDMs and 9 reporters in Myd88−/− knockout BMDMs measured statistically significant changes in pathway activity after LPS stimulation. To visualize the unique pathway activity dynamics, we further subdivided the 14 reporters into 5 distinct pathway activity profiles based on their similarity (Figures 2B, S2A-C and Table S3). These profiles recovered known pathway activations, but TF-seq was able to further distinguish differences amongst the activation profiles. For example, NFκB, SRF, and AP-1 were all activated within 30 minutes of stimulation, but their patterns subsequently diverged to produce a return to basal activity (SRF), a plateau of increased activity (NFκB), or a reduction in activity (AP-1). The activation of STAT1 and STAT3 took place after 2 hours, which is consistent with the timing of autocrine signaling by type I interferon secreted by the BMDMs. The strongest pathway activity difference in the Myd88 knockout was in the activation of STAT3, which is consistent with the role of MyD88 in IL-6 mediated STAT3 activity (Yamawaki et al., 2010).

The pathway activation patterns for other PAMPs displayed similarities to LPS, but also had characteristic features of their own (Figure 2C and Table S3). First, TF-seq confirmed the innate immune receptors with complete dependence on MyD88, TLR9 (CpG), TLR7 (R848), TLR2 (Pam3Cys) and TLR5 (FLA), as these PAMPs failed to induce pathway activity changes in Myd88 knockout BMDMs. In the case of Sendai virus stimulation, we observed a distinct response driven by IRF3 in contrast to the early activation responses driven by NFκB after stimulation with other PAMPs. Response to trehalose-6,6-dibehenate (TDB), a Mincle agonist, resulted in NRF2 activation, in contrast to most other PAMPs in which NRF2 activity was reduced. TF-seq also discovered a number of pathways previously unappreciated to exhibit a reduction in activity after engagement with PAMPs. When we compared the associated gene expression patterns from various time points with non-stimulated cells, gene expression was reduced in only 3% of all comparisons determined to be significant (FDR < 0.05). Thus, the observed reduction in pathway activity might not be functionally reflected in global gene expression, but rather provide information about cross-regulation between signaling pathways.

Genetic knockout mouse models have the potential to present a categorical phenotype, however, high-throughput genetic perturbations by RNAi or CRISPR-Cas9 often produce hypomorphic phenotypes that require a sensitive assay to detect perturbed activity. We targeted key genes in the viral sensing RIG-I like receptor (RLR) pathway using CRISPR-Cas9 genome editing (Sanjana et al., 2014) to determine whether TF-seq could detect loss-of-function mutations. TF-seq detected the known regulatory connections between Sendai virus infection and Ddx58 and Ifnar (Figure S2D-F) in a heterogeneous population of CRISPR-Cas9-edited RAW 264.7 cells, a mixture of cells with silent mutations, heterozygous loss-of-function, and homozygous loss-of-function. Therefore, TF-seq is also compatible with high-throughput genetic perturbation technologies.

Benchmarking Against Gene Expression Data

Global gene expression data can be used to computationally infer the activity of signaling pathways by using the expression levels of pathway specific transcription factors (Greenfield et al., 2013; Jojic et al., 2013; Margolin et al., 2006; Segal et al., 2003) or gene sets known to be direct targets of these transcription factors (Lefebvre et al., 2010). A fundamental limitation of the former approach is that pathways are often activated through post-translational signaling events. Even using direct target gene sets to infer pathway activation is susceptible to confounding inferences from combinatorial and cell-type specific endogenous gene regulation.

To determine whether the TF-seq pathway activity measurements in our BMDM Myd88 knockout experiment (Figure 2C) could be recapitulated from gene expression data alone, we inferred pathway activity from the expression levels of the transcriptional factors targeted for monitoring by TF-seq. To do this, the mRNA read counts for genes comprising the transcriptional regulator complexes were extracted and then we replicated the statistical analysis presented in Figure 2C to compare activity levels before and after stimulation (Experimental Procedures). When TF-seq and expression-based pathway activity were plotted against each other, we observed every possible outcome: concordance, discordance, and detection by TF-seq or gene expression alone (Figure 3A). Furthermore, it was clear that the differences in pathway activity inferences were unique to each signaling pathway.

Figure 3. TF-seq pathway activity measurements cannot be reproduced by global RNA-seq or ChIP-seq data alone.

Figure 3

(A-B) Comparison of the concordance in the statistical testing results of inferred pathway activity based on TF-seq and gene expression derived pathway activity inferences. Two approaches to infer the pathway activity data were used: using transcription factor mRNA level as a proxy of its activity (A) and averaging the expression levels of reported target genes from published ChIP-seq data (B). n = 2 wells for all TF-seq and gene expression data. See Experimental Procedures for detailed descriptions of expression based pathway activity inferences.

For example, both approaches produced largely concordant results for NFκB, STAT1, and STAT3 pathways. However, TF-seq appeared to be more sensitive in the detection of differential activity, with many instances of statistically significant TF-seq activity changes in the absence of corresponding changes in gene expression of the regulators (Figure 3A). SRF and IRF3 are illustrative examples of transcription factors that are activated through post-translational biochemical interactions and did not display any changes in mRNA levels (Figure 3A). In contrast, AP-1 pathway activity seemed to be regulated transcriptionally because TF-seq and gene expression levels were often concordant (Figure 3A).

The discordant activity inferences in the NRF2 mediated antioxidant pathway activity between TF-seq and gene expression measurements (Figure 3A) compelled us to study the NRF2 behavior in more detail. NRF2 transcriptional activity is restricted by its sequestration in the cytoplasm through interaction with KEAP1. Reactive oxygen species (ROS) disrupt the NRF2-KEAP1 interaction and lead to an increase in antioxidant gene expression regulated by NRF2 (Hong et al., 2005). Therefore, we tested the prediction of reduced NRF2 activity using an orthogonal assay to directly measure the relative abundance of reactive oxygen species (ROS). Bone marrow derived dendritic cells (BMDCs) were stimulated with LPS and then mitochondrial derived ROS, nitric oxide derived ROS and total cellular ROS levels were measured at 2, 6 and 24 hours (Figure S2G). In agreement with a metabolic shift from oxidative phosphorylation to aerobic glycolysis (Everts et al., 2012), we confirmed a reduction in ROS after LPS stimulation, which supports the NRF2 activity inference made by TF-seq. Thus, gene expression levels of the pathway-specific transcription factors could not reproduce the pathway activity measured by TF-seq.

Next, we compared the activity inferences made from the expression levels of direct target genes with those made by TF-seq. We collected target gene sets from published ChIP-seq data in related experiments with bone marrow-derived dendritic cells or macrophages stimulated with LPS (Barish et al., 2010; Garber et al., 2012), and also from unrelated contexts including HeLa (Satoh and Tabunoki, 2013) and embryonic stem cells (Chen et al., 2008). To infer pathway activity, we averaged the mRNA levels in each of the target sets (Experimental Procedures). Again, we compared the pathway activity changes before and after PAMP stimulation based on the expression of target genes versus those generated by TF-seq (Figure 3B).

These results highlight the importance of a priori target gene identification for accurate signaling pathway activity inference using gene expression. For example, the direct transcriptional targets identified by ChIP-seq in similar cell types and conditions (dendritic cells or macrophages + LPS) matched the TF-seq activity inferences relatively well for NFκB, STAT1, and STAT3 (Figure 3B). In contrast, the target genes identified in unstimulated or unrelated cell types were incapable of reproducing the pathway activity inferences made by TF-seq (Figure 3B). Still, even with direct target gene sets from similar experimental conditions, the functional heterogeneity associated with transcription factor occupancy can obscure the inferences of pathway activity. In this case, IRF1 and AP-1, target gene sets in bone marrow derived dendritic cells (BMDCs) + LPS, failed to generate the pathway activity measurements made by TF-seq. Therefore, even with high-quality direct target gene sets, the results of TF-seq could not always be replicated with global gene expression profiling.

Functional Association of Genes and Signaling Pathways by TF-seq

We sought to integrate TF-seq and gene expression data using a simple correlation model of gene expression changes with changes in pathway activity in response to PAMPs. First, nine reporters with eight distinct activity profiles were selected: NRF2 and CLOCK, AP-1, SRF, NFκB, IRF1, STAT1, IRF3, and STAT3. Of the 1,011 genes that exhibited differential expression in response to at least one PAMP (P < 0.05, Bonferroni corrected ANOVA), 765 (76%) were correlated (cor ≥ 0.4) with one of the nine signaling pathways across all time points (Figure 4A and Table S4). By an alternative method, EigenR2 (Chen and Storey, 2008), we find that the nine reporter activity signals account for 46% of all gene expression variation in the differential set of the more than 1,000 genes. These results suggest that even a small number of reporter patterns can capture the major global gene expression features in response to PAMPs.

Figure 4. Functional association of signaling pathway activity and global gene expression changes using TF-seq.

Figure 4

(A) A heatmap of the gene expression changes associated with 765 genes (75%) of the global gene expression changes after PAMP stimulation and clustered by their correlation (cor > 0.4) with TF-seq pathway activity. The DNA binding motif table on the right represents the significantly enriched DNA binding motifs identified within the open chromatin regions of the genes in the pathway activity clusters.

(B) Changes in STAT1 binding occupancy in the genes associated with STAT1 activity by TF-seq exhibits the highest enrichment for bona fide direct targets of STAT1. n = 2 wells for all TF-seq and gene expression data.

See also Figure S3 and Table S4.

We validated the functional association of pathway activity and gene expression changes using data generated by ATAC-seq (Buenrostro et al., 2013) to identify the relevant regulatory sequences (promoters and enhancers) in BMDMs. Primary BMDMs were treated with PAMPs for 1 hour before native nuclei were harvested and transposed with Illumina's Nextera reagent to enrich for accessible chromatin. The unbiased motif enrichments from JASPAR's core mammalian motifs did not enrich for the associated regulatory factors in small gene clusters. However, the larger gene clusters were strongly enriched with the corresponding TFBS: NFκB, STAT1, and IRF1 (Figure 4A).

We next characterized the gene clusters identified by TF-seq (Figure 4A) with functional annotation terms from Gene Ontology (GO) or gene set enrichment analyses (Subramanian et al., 2005) (Figure S3). The results were complementary to the TF-seq-based annotation, revealing more general functional aspects regarding the clusters but not directly identifying the regulatory programs underpinning their expression. In addition, the enrichment analysis was less informative on smaller gene groups. Thus, the integration of TF-seq and global gene expression (3'DGE) can add a regulatory dimension to global gene expression analysis that cannot be easily recovered with standard clustering and functional annotation approaches.

By correlating TF-seq with gene expression patterns, we implicitly assume that these genes’ regulatory regions were bound by greater amounts of the transcriptional regulator after stimulation. The availability of temporal ChIP-seq data on LPS-stimulated dendritic cells (Garber et al., 2012) allowed us to test this hypothesis. Using the reported binding scores for STAT1 peaks, we calculated the increase in total STAT1 binding for each gene after a 2-hour stimulation with LPS. In Figure 4B we show the distributions of these scores for the TF-seq associated gene clusters. Consistent with our hypothesis, the genes from the STAT1 cluster gained the most STAT1 binding upon stimulation with LPS. In contrast, in the STAT1 target gene set identified by ChIP-seq in HeLa cells, with and without IFNγ treatment, was significantly lower (P < 1.44 × 10−31 and P < 2.43 × 10−13 respectively) (Figure 4B). Therefore, TF-seq has the capacity to functionally associate genes and pathway activity better than existing ChIP-seq datasets in nonrelevant cell types. Thus, integrating the inferences made from TF-seq with global gene expression profiling, identified the transcription factors responsible for the major changes in global gene expression and connected them with target genes.

Halofuginone Activates NFκB and Disrupts STAT1 Activation in Primary Macrophages

Transcriptional profiling is a mainstay of small molecule mechanism of action studies because it can provide unbiased insights into signaling pathways that are altered by treatment (Lamb et al., 2006). TF-seq should augment this approach by providing more direct evidence of cellular processes perturbed by a small molecule. We tested this hypothesis by applying TF-seq to BMDMs pre-treated for 30 minutes with small molecule immunomodulators, endoplasmic reticulum (ER) stressors, and caspase inhibitors to capture a diverse class of molecules and then stimulated with LPS. We used 3′ DGE data and t-distributed stochastic neighbor embedding (t-SNE) (Maaten and Hinton, 2008) to visualize the global effects of small molecule treatment (Figure 5A). Despite measuring only 58 reporter genes, TF-seq captured the same clustering patterns as 3′ DGE (Figure S4A and Table S5). These data suggest that TF-seq can capture the major distinguishing features associated with global gene expression in response to small molecules.

Figure 5. Small molecule mechanism of action discovery using TF-seq.

Figure 5

(A) Multidimensional scaling of samples from small molecule screen based on gene expression data. n = 8 wells.

(B) Samples plotted using t-SNE coordinates and colored according to TF-seq pathway activity inferences. n = 8 wells.

(C) Halofuginone dose-dependent patterns of selected reporters and expressed as fold change against DMSO control samples. n = 2 wells.

(D) Genes that were differentially regulated by halofuginone were associated with signaling pathways based on their expression profiles across six dose-response curves (halofuginone and halofuginone + proline in three time points). The panel depicts the distribution of the genes between six pathway reporters. n = 2 wells. See also Figures S4 and S5 and Tables S5-S7.

We then integrated the 3′ DGE and TF-seq data by overlaying the signaling pathway activity measurements on top of the 3′ DGE t-SNE clustering (Figure 5B). In this manner, TF-seq complements the global gene expression data by providing insight into the underlying signaling pathway activities associated with the global transcriptional programs. For example, many of the cellular responses elicited by the prostanoid receptor agonist prostaglandin E2 (PGE2) are mediated by activation of the cAMP/CREB signaling pathway (Strassmann et al., 1994); when the pathway activities were integrated with the t-SNE-clustered 3′ DGE data, CREB stood out as a uniquely activated pathway (Figure 5B). Similarly, the anti-inflammatory effect of tert-butylhydroquinone, an electrophile stressor, was defined by the activation of the anti-oxidant pathway by NRF2 (NFE2L2) (Ma and Kinneer, 2002) (Figure 5B). Treatment with brefeldin A, an ER stressor that inhibits protein transport to the Golgi apparatus, highlighted the activation of the amino acid response (AAR) pathway represented by ATF4 activity (Kwok and Daskal, 2008) (Figure 5B).

Halofuginone (HF) treatment resulted in TF-seq data defined by a unique pattern of NFκB and SRF activation, as well as reduced STAT1 activation (Figure 5B). HF is an analog of febrifuginone, the bioactive natural product in the roots of blue evergreen hydrangea that is used in traditional Chinese medicine as an anti-inflammatory agent (Keller et al., 2012; Sundrud et al., 2009). In cells, HF binds to and inhibits the activity of glutamyl-prolyl-tRNA synthetase (EPRS), resulting in the accumulation of uncharged tRNAs and the activation of the AAR pathway (Keller et al., 2012). HF-mediated AAR pathway activation inhibits the development of pro-inflammatory Th17 cells in vitro and in vivo (Sundrud et al., 2009). However, we did not detect activation of ATF4 (AAR) in BMDMs after treatment with HF and LPS (Figures 5B and S4B). Therefore, TF-seq identified downstream consequences of inhibiting EPRS in macrophages that has not been previously annotated.

To generate systems-level insight into the consequences of EPRS inhibition and the mechanism of action of HF in macrophages, we performed a dose response experiment using TF-seq in BMDMs before and after LPS stimulation (Figures 5C and S4C, and Table S6). Proline, which is known to act as a competitive inhibitor of HF, shifted all of the significant differences induced by HF, indicating that the effects we observed were the consequence of EPRS inhibition (Figures 5C and S4C). Furthermore, because of TF-seq's throughput, we could distinguish pathway-specific sensitivities to the dose of HF (Figure 5C and S4C). For example, NFκB and CEBPA pathway activity inferences were more sensitive to lower doses of HF compared to CREB and STAT1 (Figure 5C). The activation of NFκB and CREB rose and plateaued out to the highest dose of HF, whereas CEBPA activity was only stimulated at the two lowest doses of HF (Figure 5C). Thus, in contrast to HF's AAR activation in T cells, the mode of action of halofuginone in primary macrophages is marked by an activation of NFκB and a disruption of LPS-induced STAT1 activation.

We then returned to the same correlation-based clustering of global gene expression and TF-seq employed in Figure 4A, to cluster the HF 3′ DGE data (Figures 5D and S5, and Table S7). HF effects on NFκB and STAT1 activity were clearly implicated in the 3′ DGE data, however, 19% of the 273 genes associated with reduced STAT1 activity were ribosomal proteins and unlikely to be true STAT1 targets. But more importantly, TF-seq inferred an additional nine differentially regulated pathways (Figure S4C). These data highlight the difficulty in pathway activity inference from gene expression and suggest that the failure to activate STAT1 effectively in macrophages may contribute to the anti-inflammatory activity of HF in vivo.

Discussion

TF-seq is a parallel pathway reporter assay that enables functional associations with a variety of biological signaling processes, including pathogen detection, cytokines, nuclear hormones, calcium, oxidative changes, starvation, and morphogens. We demonstrated the use of TF-seq to multiplex pathway activity inferences in primary cells from knockout mice, after CRISP-Cas9 genetic editing, and for small-molecule mechanism of action discovery with halofuginone. TF-seq differs from previous reports of massively parallel reporter gene assays (MPRAs) because in contrast to dissecting the functional elements of an endogenous enhancer with thousands of vectors, TF-seq's aim is to gain more information related to the pathways regulating global gene expression. This is achieved with synthetic DNA elements of TFBS to report on over 40 signaling pathways using a library consisting of thousands of sequence-tagged vectors. The potential for TF-seq to capture the major features of global gene expression through the activity of only a few regulators, as described in this report, is a feature that requires further exploration in different cell types.

The use of TFBS sequences incurs the benefit of the ability to infer activity from transcription factor orthologs; however, it also exposes the potential to confound inferences due to the similarity of DNA-binding domains amongst transcription factor paralogs. In addition, TF-seq is susceptible to generalized transcriptional activation and careful normalization of the sequence tag counts is required. For example, when only 12 reporters were used to benchmark TF-seq against traditional luciferase assays (Figure 1E) the correlations were low for the strongest activating PAMPs, Sendai virus and Zymosan. Lastly, TF-seq relies on a population of cells to infer multiple distinct pathway activities and cannot inform on multiple pathways within a single cell. Future adaptations to incorporate more than one reporter gene per cell or inclusion of cell sorting-based Flow-seq (Kosuri et al., 2013) can be used to inform on multiple pathways in a single cell.

Sequence tagging cDNA during the reverse transcriptase reaction of RNA-seq, combined with the subsequent reagent and liquid handling sharing, significantly reduces the cost of RNA-sequencing library preparation. This feature will allow TF-seq to be used in discovery based screening applications and in depth analysis of signaling pathway dynamics. Thus, the systematic application of TF-seq to large-scale perturbations of cellular functions will open the opportunity to apply network reconstruction and mathematical modeling techniques to discover new fundamental insights into signal transduction.

Experimental Procedures

Synthesized DNA Oligonucleotides

DNA oligonucleotides were synthesized by Integrated DNA Technologies (IDT). Complete DNA sequences of all primers and cloning oligonucleotides can be found in Tables S1, S2, and S4.

Gene Expression Data and Code Availability

All raw and normalized gene expression data presented in this manuscript is available at the Gene Expression Omnibus (GEO) Database GSE75212. TF-seq and 3′ digital gene expression (3′ DGE) counting and mapping scripts were written in Python and are available at Mendeley datasets doi:10.17632/grkvc54ggm.1.

Reagents

DNA plasmids were propagated in Stbl3 cells (Life Technologies). Plasmids were purified by endotoxin-free midiprep kit (Macherey-Nagel). DMEM, penicillin-streptomycin, Glutamax, OPTIMEM, Lipofectamine 2000, E-Gel EX Agarose Gels, and Dynabeads MyOne Streptavidin C1 were purchased from Life Technologies. Additional reagents: Herculase II Phusion Polymerase (Agilent), Maxima reverse transcriptase and RNaseA (Thermo-Fisher), Betaine and Fetal bovine serum (FBS) (Sigma), TURBO DNase (Ambion), Second Strand Synthesis Module (NEB), RLT lysis buffer (Qiagen), Agencourt AMPure XP (Beckman Coulter), and Zymoclean Gel DNA Recovery Kit (Zymo Research), In-Fusion (Clontech), Amicon Ultra-15 Centrifugal Filter (EMD-Millipore), Steadylite Plus (PerkinElmer), PEG-8000 (Sigma), Nextera and NexteraXT (Illumina), 50-cycle MiSeq and 75-cycle NextSeq (Illumina).

Innate Immune Ligands and Small Molecule Final Concentrations

300 ng/mL Pam3Cys-SKKKK (EMC-microcollections L2000), 20 μg/mL LMW polyinosinicpolycytidylic acid (InvivoGen tlrl-picw), 100 ng/mL lipopolysaccharides E. coli 055:B5 (Sigma L6529), 5 μg/mL FLA (B. subtilis) ~32 kDa Gram-positive flagellin (InvivoGen FLA-BS), 10 μg/mL R848 imidazoquinoline compound (InvivoGen tlrl-r848), 5 μM CpG (ODN 1826) unmethylated CpG dinucleotides (InvivoGen tlrl-1826), 10 μg/mL muramyl dipeptide (InvivoGen tlrl-mdp), 10 μg/mL synthetic analog of trehalose-6,6- dimycolate (TDB, Trehalose-6,6-dibehenate) (InvivoGen tlrl-tdb), MOI ~10 Sendai virus (Sendai strain: Cantell) (ATCC VR-907), 5 μg/mL zymosan (S. cerevisiae) (Sigma Z4250), 20 μg/mL HT-DNA (herring testis DNA) (Trevigen). DMSO (Sigma), 10 μM forskolin (Cayman Chemical), 25 μM tert-Butylhydroquinone (tBHQ) (Sigma), 5 μM prostaglandin E2 (pGE2) (Sigma), 1 μM torin (Axon Medchem), 10 μM halofuginone (Santa Cruz Biotechnology), 100 nM bafilomycin (Sigma), 20 μM ZVAD pan-caspase inhibitor (Calbiochem), caspase I inhibitor IV (Calbiochem), 5 μg/mL brefeldin A, 5 μg/mL tunicamycin (Sigma).

Cell Culture and TF-seq Transduction

All mouse work was conducted under an animal protocol approved by the Subcommittee on Research Animal Care, the Institutional Animal Care and Use Committee for Massachusetts General Hospital. The mouse bones from 13 week-old Myd88−/− knockout mice were a gift of Albert C. Shaw (Yale School of Medicine) and shipped overnight on wet ice. Bones from C57BL6 mice 8-13 weeks of age were used in all other experiments. Bone marrow derived macrophages (BMDMs) were differentiated in pen/strep containing DMEM plus 30% FBS and 10% L929 M-CSF conditioned medium (CMG) for 9 days. Bone marrow derived dendritic cells (BMDCs) were differentiated in pen/strep containing DMEM plus 10% FBS and 4% L929 GMCSF conditioned medium (TOPO) for 7 days. HEK293T lentiviral packaging cells were a gift from the Broad Institute's Genetic Perturbation Platform and propagated in DMEM plus 10% FBS. HEK293T cells were transfected with pCMV-dR8.91, VSV-G, and an equimolar pool of 58 pathway reporter vectors using Lipofectamine 2000. BMDMs were transduced in 90% concentrated lentiviral supernatant + 10% CMG on day 4. BMDMs were scraped off of petri dishes, counted on a hemocytometer, and plated overnight in DMEM +10% FBS at 100,000 cells/well in 96-well tissue-culture treated plates (Corning) before stimulation and assay collection the next day. RAW 264.7 cells were purchased from ATCC and maintained in Dulbecco's Modified Eagle Medium (DMEM) (Thermo Fisher) supplemented with 10% fetal bovine serum (Sigma), 1x Glutamax (Thermo Fisher) and 100 I.U./mL penicillin and 100 μ/mL streptomycin (Thermo Fisher). This cell line was confirmed mycoplasma-free by PCR detection.

High-Throughput Sequencing

TF-seq experiments containing up to 576 samples were indexed and pooled for multiplexed sequencing on a single Illumina 50-cycle V2 MiSeq flow cell with 10% PhiX. ATAC-seq data were sequenced with one 75 cycle NextSeq flow cell. 3′ DGE data were sequenced with one 75 cycle NextSeq flow cell containing up to 576 samples.

Dual TF-seq and 3′ DGE RNA-Seq

Cells were lysed in RLT lysis buffer (Qiagen) and stored at −80°C until the start of RNA-seq library prep. RNA was purified using Agentcourt RNA Clean XP to precipitate the nucleic acids in 1.25M NaCl and 10% PEG-8000 (details in Supplemental Text). Maxima reverse transcriptase (Thermo Scientific) was run according to the manufacturer's instructions using a multiplexed primed reverse transcriptase reaction. The biotinylated 96-well sequence tagged TF-seq specific reverse transcriptase primers and the biotinylated 96-well degenerate sequence-tagged polydT reverse transcriptase primers (Table S2) were used at 750 nM and 250 nM final concentrations respectively with 50 units of Maxima. After sequencing-tagging all cDNA during reverse transcriptase, each 96-well plate was pooled and the unincorporated primers were washed away from the cDNA by precipitating with 10% PEG-8000 and 1.25 M NaCl. Amplification of the TF-seq gene reporter amplicon was performed on 50% of the cDNA using primers with full Illumina-compatible sequencing adapters in a 600 uL PCR reaction for 28 cycles. The 422 bp amplicon was then gel extracted for sequencing. The remaining 50% of the first strand cDNA was converted to double strand cDNA using NEB's Second Strand Synthesis Module. The pooled cDNA libraries for TF-seq are sequence-tagged only on the 3′ end of the sense transcript therefore after full-length ds-cDNA tagmentation we performed enrichment PCR of only the sequence-tagged 3′ end of the mRNA transcript by combining the Nextera N700 series of primers with the Tru-Seq P5 adapter (Figure S1I).

Normalization and Statistical Analysis for TF-seq and Global Gene Expression

TF-seq is a 50 bp single-end read, well-tag, RNA unique molecular identifier (UMI), followed by a 17 bp constant sequence, then the reporter tag UMI, and ultimately the reporter-tag (Figure S1E). We counted the number of unique RNA molecules for every well and reporter element, requiring a perfect match for the respective tags. The 3′ digital gene expression (3’DGE) sequencing is paired-end reads. Read 1 contains the well-tag and RNA UMI. Read 2 contains a fragment from the 3′ end of the transcript (Figure S1L). Reads were mapped to RefSeq (mm10) transcript sequences using BWA (Li and Durbin, 2009) with default settings. Both TF-seq and 3′DGE returned counts of unique RNA molecules corresponding to reporters or genes. Downstream processing for both matrices was identical. For statistical testing, we used generalized linear models from the EdgeR (Robinson et al., 2010) package that assume negative binomial distribution and are optimized for analyzing count data. For Figures 2C, 3, and 4A we compared two replicates from each time point, stimulation and genotype combination with 24 non-stimulated samples of the same genotype. When plotting the TF activity and gene expression patterns (Figures 2B, 2C, 3, and 4A) we used the log2 fold changes returned by edgeR tests. The p-values presented in Figure 4 correspond the ANOVA test for each time series, also with edgeR. To calculate the associations in Figure S2C, we compared reporter activity after Sendai virus infection in CRISPR-Cas9 targeted in/dels of Ddx58 and Ifnar compared to CRISPR-Cas9 targeted of controls: Tlr3, Tlr4, and Ifih1. Associations where Bonferroni corrected p-value is smaller than 0.01 are displayed. For analyses in Figure 5C and 5D, we compared two replicates for each concentration and time point to 12 corresponding controls. The reported log2 fold changes were used for visualization (Figures 5C, S4C, and S5) and correlation analysis.

Estimating Transcription Factor Activity from Regulator and Target mRNA Levels

To infer regulator activity from gene expression of transcription factors (Figure 3A) we assigned pathway reporters to their corresponding genes. AP-1 used values from Jun and Fos. NFkB used values from NFkB, Rel, Rela and Relb. Stat1, Stat2 and Irf9 were used for STAT1/STAT2 and Stat3 and Stat1 for STAT3; the remainder gene names correspond to regulator names. We did not aggregate the values of multiple genes. To estimate regulator activity based on target gene expression, target gene sets were curated from published ChIP-seq data (see External Chip-seq Data for details), and then the gene expression profiles were extracted and their expression averaged using Log2 transformed counts-per-million reads (CPM) values of the target genes expressed in more than half of the samples. These data were then subjected to similar statistical testing as TF-seq measurements in Figure 2C, by comparing the two replicates from each time point, stimulation and genotype, with the 24 non-stimulated samples of the same genotype. This resulted in 168 tests per regulator and the results were plotted against the corresponding test results from TF-seq. To test the regulator's gene expression we used generalized linear model from EdgeR. For target-based inference we used moderated t-test from limma package because the averaged values were not in count format.

Correlation Analysis Between TF-seq and Global Gene Expression

To select the genes for correlation based analysis presented in Figure 4, we targeted genes that were significantly regulated by at least one PAMP in either wild type or knockout cells (Bonferroni corrected ANOVA P value < 0.05). For analysis of halofuginone dose curve, we selected only genes that were significantly affected (Bonferroni corrected P < 0.05 and fold change > 2) by either halofuginone or halofuginone and proline, in ≥ 2 concentrations. In both cases, the reporters were manually chosen to represent distinct patterns in the data. Log2 fold changes were used to associate gene expression and TF-seq activity profiles and calculate the correlations of reporters and genes. Each gene was associated with the highest correlated reporter and associations < 0.4 were discarded.

ATAC-seq, Motif Analysis and GO Enrichment

In order to identify regulatory DNA sequence in BMDMs, we applied the assay for transposase-accessible chromatin using sequencing (ATAC-seq) to 50,000 BMDMs from non-stimulated cells in triplicate and stimulated cells (12 different PAMPs) in singlet after 1 hour. Native nuclei were prepared in 10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% IGEPAL. The nuclei were transposed at 37°C for 30 min. The transposed DNA was purified using a Zymo gel extraction column. 50-100 ng of transposed DNA was recovered and 20% of the yield was PCR-amplified for 18 cycles using Nextera primers. The sequencing data was aligned to reference genome (mm9) using Bowtie (Langmead et al., 2009) with default parameters. All reads were pooled to identify regions of open chromatin using MACS (Zhang et al., 2008) and then assigned to the closest gene under 100 Kb away using Homer software (Heinz et al., 2010). The average motif affinity (AMA) scores (Buske et al., 2010) were calculated for each gene by concatenating the associated regulatory sequence and using the JASPAR core mammalian motifs (Mathelier et al., 2014). Motif enrichment was determined by applying a t-test comparing AMA scores for genes within the cluster to scores of genes in other clusters (McLeay and Bailey, 2010). The Gene Ontology enrichment analysis was performed using Bioconductor GOsummaries package (Kolde and Vilo, 2015) that visualizes the enrichment results from g:Profiler toolkit (Reimand et al., 2011).

Published Chip-seq Data

Chip-seq peaks from dendritic cells stimulated with LPS (Garber et al., 2012) were downloaded with binding scores from GEO (id GSE36104). The peaks were assigned to the closest gene under 100 Kb away using Homer software. The highest scoring peak from each time point was used for the final association score between each gene and transcription factor. For the target expression analysis (Figure 3B) we selected the 500 highest scoring genes for each transcription factor. All of the STAT1 targets detected in HeLa cells were used and downloaded from published supplementary data (Satoh and Tabunoki, 2013). The top 500 genes from NFκB targets in macrophages (Barish et al., 2010) and STAT3 targets in embryonic stem cells (Chen et al., 2008) were used and downloaded from HmChip database (Chen et al., 2011). The increase in STAT1 ChIP-seq peaks after LPS stimulation (Figure 4B) was calculated by summing the binding scores of all associated STAT1 peaks in both 0 h and 2 h time points and then subtracting the former from the latter.

Analysis of Small Molecule Perturbations

To generate t-SNE plots we used normalized matrices for both gene expression and TF-seq. We normalized the TF-seq data by first applying the log2 transform and then dividing values for each sample by the sample average over all reporters. The gene expression values were normalized to counts per million (CPM) scores using trimmed mean of M-values (TMM) normalization from edgeR package. The dimensionality reduction was performed in two steps. First, we applied principal component analysis (PCA) on the 20 and 1000 most variable reporters and genes respectively. We then used the first 5 principal components in the t-SNE analysis using the R package tsne. The best visual properties were achieved using perplexity of 15 and default values for other parameters. The PCA step allowed us to reduce the noise levels in the data and to achieve clearer separation between the groups.

Supplementary Material

1
2
3
4
5
6
7
8

Acknowledgements

We thank Natalia Nedelsky, Kara Lassen, Hera Vlamakis, and members of the Xavier Laboratory for critical comments. We thank Jean Wilson of the Albert C. Shaw laboratory for genotyping, harvesting, and shipping Myd88 knockout bones. We thank Luke O'Neill and Annie Curtis for their helpful comments. R.J.X. was supported by The Leona M. and Harry B. Helmsley Charitable Trust.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Author Contributions

D.J.O., R.K., T.S.M., and R.J.X. conceived ideas and experimental design. T.S.M. designed the pathway reporter construct and sequencing strategy. D.J.O. constructed the lentiviral reporter library and performed the experiments. M.S. developed the pathway reporter sequencing and 3 ′ DGE sequence counting Python scripts. R.K., D.J.O., and R.J.X. conceived, and R.K. conducted the statistical analysis. D.B.G., T.B.S., I.L., T.S.M., and R.J.X. supervised the study. D.J.O., R.K., and R.J.X. wrote the manuscript. All authors read and approved the final manuscript.

Competing financial interests

The authors declare no competing financial interests.

References

  1. Akhtar W, de Jong J, Pindyurin AV, Pagie L, Meuleman W, de Ridder J, Berns A, Wessels LFA, van Lohuizen M, van Steensel B. Chromatin position effects assayed by thousands of reporters integrated in parallel. Cell. 2013;154:914–927. doi: 10.1016/j.cell.2013.07.018. [DOI] [PubMed] [Google Scholar]
  2. Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A. Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq. Science. 2013;339:1074–1077. doi: 10.1126/science.1232542. [DOI] [PubMed] [Google Scholar]
  3. Barish GD, Yu RT, Karunasiri M, Ocampo CB, Dixon J, Benner C, Dent AL, Tangirala RK, Evans RM. Bcl-6 and NF-kappaB cistromes mediate opposing regulation of the innate immune response. Genes Dev. 2010;24:2760–2765. doi: 10.1101/gad.1998010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bellis AD, Peňalver-Bernabé B, Weiss MS, Yarrington ME, Barbolina MV, Pannier AK, Jeruss JS, Broadbelt LJ, Shea LD. Cellular arrays for large-scale analysis of transcription factor activity. Biotechnol. Bioeng. 2011;108:395–403. doi: 10.1002/bit.22916. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bronstein I, Fortin J, Stanley PE, Stewart GSAB, Kricka LJ. Chemiluminescent and Bioluminescent Reporter Gene Assays. Anal. Biochem. 1994;219:169–181. doi: 10.1006/abio.1994.1254. [DOI] [PubMed] [Google Scholar]
  6. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Buske FA, Bodén M, Bauer DC, Bailey TL. Assigning roles to DNA regulatory motifs using comparative genomics. Bioinforma. Oxf. Engl. 2010;26:860–866. doi: 10.1093/bioinformatics/btq049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chen LS, Storey JD. Eigen-R2 for dissecting variation in high-dimensional studies. Bioinforma. Oxf. Engl. 2008;24:2260–2262. doi: 10.1093/bioinformatics/btn411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chen L, Wu G, Ji H. hmChIP: a database and web server for exploring publicly available human and mouse ChIP-seq and ChIP-chip data. Bioinforma. Oxf. Engl. 2011;27:1447–1448. doi: 10.1093/bioinformatics/btr156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008;133:1106–1117. doi: 10.1016/j.cell.2008.04.043. [DOI] [PubMed] [Google Scholar]
  11. Everts B, Amiel E, van der Windt GJW, Freitas TC, Chott R, Yarasheski KE, Pearce EL, Pearce EJ. Commitment to glycolysis sustains survival of NO-producing inflammatory dendritic cells. Blood. 2012;120:1422–1431. doi: 10.1182/blood-2012-03-419747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Garber M, Yosef N, Goren A, Raychowdhury R, Thielke A, Guttman M, Robinson J, Minie B, Chevrier N, Itzhaki Z, et al. A high-throughput chromatin immunoprecipitation approach reveals principles of dynamic gene regulation in mammals. Mol. Cell. 2012;47:810–822. doi: 10.1016/j.molcel.2012.07.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gorman CM, Moffat LF, Howard BH. Recombinant genomes which express chloramphenicol acetyltransferase in mammalian cells. Mol. Cell. Biol. 1982;2:1044–1051. doi: 10.1128/mcb.2.9.1044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Greenfield A, Hafemeister C, Bonneau R. Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks. Bioinforma. Oxf. Engl. 2013;29:1060–1067. doi: 10.1093/bioinformatics/btt099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Holmqvist E, Reimegård J, Wagner EGH. Massive functional mapping of a 5′-UTR by saturation mutagenesis, phenotypic sorting and deep sequencing. Nucleic Acids Res. 2013;41:e122–e122. doi: 10.1093/nar/gkt267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hong F, Sekhar KR, Freeman ML, Liebler DC. Specific patterns of electrophile adduction trigger Keap1 ubiquitination and Nrf2 activation. J. Biol. Chem. 2005;280:31768–31775. doi: 10.1074/jbc.M503346200. [DOI] [PubMed] [Google Scholar]
  18. Jojic V, Shay T, Sylvia K, Zuk O, Sun X, Kang J, Regev A, Koller D, Immunological Genome Project Consortium. Best AJ, et al. Identification of transcriptional regulators in the mouse immune system. Nat. Immunol. 2013;14:633–643. doi: 10.1038/ni.2587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Keller TL, Zocco D, Sundrud MS, Hendrick M, Edenius M, Yum J, Kim Y-J, Lee H-K, Cortese JF, Wirth DF, et al. Halofuginone and other febrifugine derivatives inhibit prolyl-tRNA synthetase. Nat. Chem. Biol. 2012;8:311–317. doi: 10.1038/nchembio.790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kivioja T, Vähärautio A, Karlsson K, Bonke M, Enge M, Linnarsson S, Taipale J. Counting absolute numbers of molecules using unique molecular identifiers. Nat. Methods. 2012;9:72–74. doi: 10.1038/nmeth.1778. [DOI] [PubMed] [Google Scholar]
  21. Kolde R, Vilo J. GOsummaries: an R Package for Visual Functional Annotation of Experimental Data. F1000Research. 2015 doi: 10.12688/f1000research.6925.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kosuri S, Goodman DB, Cambray G, Mutalik VK, Gao Y, Arkin AP, Endy D, Church GM. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 2013;110:14024–14029. doi: 10.1073/pnas.1301301110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kwok SCM, Daskal I. Brefeldin A activates CHOP promoter at the AARE, ERSE and AP-1 elements. Mol. Cell. Biochem. 2008;319:203–208. doi: 10.1007/s11010-008-9893-3. [DOI] [PubMed] [Google Scholar]
  24. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet J-P, Subramanian A, Ross KN, et al. The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science. 2006;313:1929–1935. doi: 10.1126/science.1132939. [DOI] [PubMed] [Google Scholar]
  25. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lefebvre C, Rajbhandari P, Alvarez MJ, Bandaru P, Lim WK, Sato M, Wang K, Sumazin P, Kustagi M, Bisikirska BC, et al. A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers. Mol. Syst. Biol. 2010;6:377. doi: 10.1038/msb.2010.31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Ma Q, Kinneer K. Chemoprotection by phenolic antioxidants. Inhibition of tumor necrosis factor alpha induction in macrophages. J. Biol. Chem. 2002;277:2477–2484. doi: 10.1074/jbc.M106685200. [DOI] [PubMed] [Google Scholar]
  29. Maaten L. van der, Hinton G. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
  30. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006;7(Suppl 1):S7. doi: 10.1186/1471-2105-7-S1-S7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen C, Chou A, Ienasescu H, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014;42:D142–D147. doi: 10.1093/nar/gkt997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. McLeay RC, Bailey TL. Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data. BMC Bioinformatics. 2010;11:165. doi: 10.1186/1471-2105-11-165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, Feizi S, Gnirke A, Callan CG, Kinney JB, et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 2012;30:271–277. doi: 10.1038/nbt.2137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Padmashali RM, Mistriotis P, Liang M, Andreadis ST. Lentiviral Arrays for Live-cell Dynamic Monitoring of Gene and Pathway Activity During Stem Cell Differentiation. Mol. Ther. 2014;22:1971–1982. doi: 10.1038/mt.2014.103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee S-I, Cooper GM, et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 2012;30:265–270. doi: 10.1038/nbt.2136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Reimand J, Arak T, Vilo J. g:Profiler--a web server for functional interpretation of gene lists (2011 update). Nucleic Acids Res. 2011;39:W307–W315. doi: 10.1093/nar/gkr378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinforma. Oxf. Engl. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Sanjana NE, Shalem O, Zhang F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods. 2014;11:783–784. doi: 10.1038/nmeth.3047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Satoh J-I, Tabunoki H. A Comprehensive Profile of ChIP-Seq-Based STAT1 Target Genes Suggests the Complexity of STAT1-Mediated Gene Regulatory Mechanisms. Gene Regul. Syst. Biol. 2013;7:41–56. doi: 10.4137/GRSB.S11433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 2003;34:166–176. doi: 10.1038/ng1165. [DOI] [PubMed] [Google Scholar]
  41. Soumillon M, Cacchiarelli D, Semrau S, Oudenaarden A. van, Mikkelsen TS. Characterization of directed differentiation by high-throughput single-cell RNA-Seq. bioRxiv. 2014:003236. [Google Scholar]
  42. Strassmann G, Patil-Koota V, Finkelman F, Fong M, Kambayashi T. Evidence for the involvement of interleukin 10 in the differential deactivation of murine peritoneal macrophages by prostaglandin E2. J. Exp. Med. 1994;180:2365–2370. doi: 10.1084/jem.180.6.2365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 2005;102:15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sundrud MS, Koralov SB, Feuerer M, Calado DP, Kozhaya AE, Rhule-Smith A, Lefebvre RE, Unutmaz D, Mazitschek R, Waldner H, et al. Halofuginone inhibits TH17 cell differentiation by activating the amino acid starvation response. Science. 2009;324:1334–1338. doi: 10.1126/science.1172638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc. Natl. Acad. Sci. U. S. A. 2005;102:13544–13549. doi: 10.1073/pnas.0506577102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Vockley CM, Guo C, Majoros WH, Nodzenski M, Scholtens DM, Hayes MG, Lowe WL, Reddy TE. Massively parallel quantification of the regulatory effects of non-coding genetic variation in a human cohort. Genome Res. 2015:gr.190090.115. doi: 10.1101/gr.190090.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Yamawaki Y, Kimura H, Hosoi T, Ozawa K. MyD88 plays a key role in LPS-induced Stat3 activation in the hypothalamus. Am. J. Physiol. Regul. Integr. Comp. Physiol. 2010;298:R403–R410. doi: 10.1152/ajpregu.00395.2009. [DOI] [PubMed] [Google Scholar]
  48. Zhang Y, Liu T, Meyer C, Eeckhoute J, Johnson D, Bernstein B, Nussbaum C, Myers R, Brown M, Li W, et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Zhao W, Pollack JL, Blagev DP, Zaitlen N, McManus MT, Erle DJ. Massively parallel functional annotation of 3′ untranslated regions. Nat. Biotechnol. 2014;32:387–391. doi: 10.1038/nbt.2851. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5
6
7
8

Data Availability Statement

All raw and normalized gene expression data presented in this manuscript is available at the Gene Expression Omnibus (GEO) Database GSE75212. TF-seq and 3′ digital gene expression (3′ DGE) counting and mapping scripts were written in Python and are available at Mendeley datasets doi:10.17632/grkvc54ggm.1.

RESOURCES