Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jul 11.
Published in final edited form as: Cell. 2019 Jun 20;178(2):473–490.e26. doi: 10.1016/j.cell.2019.05.027

Atlas of Subcellular RNA Localization Revealed by APEX-seq

Furqan M Fazal 1,2,3,9, Shuo Han 3,4,5,9, Kevin R Parker 1,2,3,10, Pornchai Kaewsapsak 3,4,5,10, Jin Xu 1,2,3, Alistair N Boettiger 6, Howard Y Chang 1,2,3,7,*, Alice Y Ting 3,4,5,8,11,*
PMCID: PMC6786773  NIHMSID: NIHMS1531987  PMID: 31230715

SUMMARY

We introduce APEX-seq, a method for RNA sequencing based on direct proximity labeling of RNA using the peroxidase enzyme APEX2. APEX-seq in nine distinct subcellular locales produced a nanometer-resolution spatial map of the human transcriptome as a resource, revealing extensive patterns of localization for diverse RNA classes and transcript isoforms. We uncover a radial organization of the nuclear transcriptome, which is gated at the inner surface of the nuclear pore for cytoplasmic export of processed transcripts. We identify two distinct pathways of messenger RNA localization to mitochondria, each associated with specific sets of transcripts for building complementary macromolecular machines within the organelle. APEX-seq should be widely applicable to many systems, enabling comprehensive investigations of the spatial transcriptome.

Graphical Abstract

graphic file with name nihms-1531987-f0008.jpg

1-sentence summary

A newly-developed technique reveals the subcellular transcriptomes at many landmarks in the nucleus and cytosol, and connects mRNA localization to genome architecture, protein location and local-translation mechanisms.

INTRODUCTION

The subcellular localization of RNA is intimately tied to its function(Buxbaum et al., 2015). Asymmetrically-distributed RNAs underlie organismal development, local protein translation, and the 3D organization of chromatin. Where an RNA is located within the cell likely determines whether it will be stored, processed, translated(Berkovits and Mayr, 2015), or degraded(Fasken and Corbett, 2009).

While many methods have been developed to study RNA localization(Weil et al., 2010), only a few have been applied on a transcriptome-wide scale. The most classical approach is biochemical fractionation to enrich specific organelles, followed by RNA sequencing (“fractionation-seq”). However, a major limitation of fractionation-seq is that it cannot be applied to organelles that are impossible to purify, such as the nuclear lamina and outer mitochondrial membrane (OMM). Even for organelles that can be enriched by centrifugation, such as mitochondria, current protocols fail to remove contaminants(Sadowski et al., 2008).

RNA localization can also be directly visualized by microscopy(Bertrand et al., 1998; Femino et al., 1998), and techniques have recently been pioneered for imaging thousands of cellular RNAs at once using barcoded oligonucleotides(Chen et al., 2015b; Shah et al., 2016). The drawbacks of these fluorescence in-situ hybridization (FISH)-based approaches, however, are the need for designed probe sets targeting RNAs of interest; the requirement for cell fixation and permeabilization, which can relocalize cellular components(Fox et al., 1985; Schnell et al., 2012); the difficulty of assigning RNAs to specific cellular landmarks due to spatial resolution limits; and the limited information content compared to RNA sequencing. Finally, these transcriptome-wide imaging methods are technically challenging and require specialized instrumentation not available to most.

An adaptation of ribosome profiling(Ingolia et al., 2009) has enabled this technique to profile actively-translated mRNAs in specific cellular locales. The two demonstrations – on the ER membrane (ERM) in yeast and mammalian cells(Jan et al., 2014), and on the OMM in yeast(Williams et al., 2014) – showed high spatial specificity and compatibility with living cells. However, the methodology cannot detect non-coding RNAs, or non-translated mRNAs. Proximity-specific ribosome profiling is also not yet a fully generalizable method, as the requirement for biotin starvation during cell culture is prohibitively toxic to many cell types.

Hence, there remains a need for new methodology that can map the spatial localization of thousands of endogenous RNAs at once in living cells. The method should be applicable to any subcellular region, and capture full sequence details of any RNA type, enabling comparisons across RNA variants and isoforms. Here we develop the “APEX-seq” methodology in an effort to provide these capabilities. We characterize the APEX-seq approach, and then apply it to nine subcellular locations, generating a high-resolution atlas of endogenous RNA localization in living human HEK-293T cells. Our data reveal correlations between localization of mRNAs and the protein products they encode, as well as patterns of RNA localization and underlying genome architecture. An analysis of mRNAs at the OMM suggests distinct mechanisms for RNA targeting that correlate with the sequence and function of the encoded mitochondrial proteins. These examples illustrate the versatility of APEX-seq and its ability to nominate or test novel biological hypotheses.

RESULTS

APEX-catalyzed Labeling of RNA

To develop the methodology, we drew from previous work in our laboratory using enzymes to map spatial proteomes(Rhee et al., 2013). APEX2(Lam et al., 2014) is an evolved mutant of soybean ascorbate peroxidase that catalyzes the one-electron oxidation of biotinphenol (BP), a membrane-permeable small molecule. The resulting BP radical is short-lived (half-life <1 ms)(Mortensen and Skibsted, 1997; Wishart and Rao, 2010) and covalently conjugates onto protein sidechains. Hence APEX2 catalyzes the promiscuous biotin-tagging of endogenous proteins within a few nanometers of its active site in living cells. The high spatial specificity of this approach has enabled APEX-mapping of numerous organelle proteomes as well as protein interaction networks(Han et al., 2018).

We previously combined APEX proteomic tagging with formaldehyde protein-RNA crosslinking in order to extend our analysis to cellular RNAs(Kaewsapsak et al., 2017). While this “APEX-RIP” approach was effective at mapping the RNA composition of membrane-enclosed organelles such as the mitochondrion, its spatial specificity was poor in “open” or non-membrane enclosed cellular regions. For instance, RNAs enriched by APEX targeted to the ERM (facing cytosol) were no different from those enriched by cytosolic APEX.

A more straightforward and potentially higher-specificity approach would be to bypass crosslinking altogether and use APEX peroxidase to directly biotinylate cellular RNAs within a short time window (Figure 1A). To test if peroxidase-generated phenoxyl radicals could biotinylate RNA in vitro, we combined horseradish peroxidase (HRP), which catalyzes the same one-electron oxidation chemistry as APEX2, with tRNA, biotin-phenol (BP), and H2O2. On a streptavidin dot blot, we observed robust tRNA biotinylation that was abolished by RNase treatment, but unaffected by proteinase K treatment (Figure S1A). We next used a reverse transcriptase (RT) stop assay to evaluate the labeling, and found that while full-length transcripts are still produced, multiple RT stops are observed at G-rich regions in peroxidase-catalyzed RNA samples (Figure S1DE). Additional experiments characterized the covalent adduct between G and BP by HPLC and mass spectrometry (Figure S1BC).

Figure 1: Development of APEX-seq methodology.

Figure 1:

(A) APEX2-mediated proximity biotinylation of endogenous RNAs. APEX2 peroxidase is genetically targeted to the cellular region of interest. Addition of biotin-phenol (red B = biotin) and H2O2 to live cells for 1 minute results in biotinylation of endogenous proteins and RNA within a few nanometers of APEX2. Biotinylated RNAs are separated using streptavidin-coated beads, polyA-selected, and analyzed by RNA-seq.

(B) Streptavidin-biotin dot-blot analysis of direct RNA biotinylation by APEX2 in cells. HEK-293T cells expressing APEX2 in the cytosol were labeled with for 1 minute, then the RNA was extracted and blotted. Only when BP, H2O2, and APEX2 were all present was signal observed. RNase treatment of the sample abolished the signal.

(C) RT-qPCR analysis showing specific enrichment of mitochondrial RNAs (grey) over cytosolic mRNAs (white). Cells expressing APEX2 targeted to the mitochondrial matrix were labeled for 1 minute. Biotinylated RNAs were enriched following RNA extraction. Data are the mean of 4 replicates-± 1 standard deviation (S.D.).

(D) RT-qPCR analysis showing specific enrichment of secretory (red) over non-secretory (grey) mRNAs with APEX-seq, but not APEX-RIP. Cells stably expressing APEX2 targeted to the ER membrane (facing cytosol) were labeled for 1 minute. For APEX-RIP, RNAs were crosslinked to proteins for 10 minutes before streptavidin beads enrichment. Data are the mean of 4 replicates-± 1 S.D.. The data was normalized such that the mean enrichment of non-secretory RNAs was 1 for both techniques.

(E) Human cell showing nine different subcellular locations investigated.

(F) Fluorescence imaging of APEX2 localization and biotinylation activity. Live-cell biotinylation was performed for 1 minute in cells stably expressing the indicated APEX2 fusion protein. APEX2 expression was visualized by GFP or antibody staining (green). Biotinylation was visualized by staining with neutravidin-AlexaFluor 647 (red). DAPI is a nuclear marker. Endogenous TOM20 and CANX were used as markers for the mitochondria and ER, respectively. Scale bars, 10 μm.

To test APEX-catalyzed RNA biotinylation in living cells, we generated HEK cells stably expressing APEX2 in the cytosol. We labeled the cells with BP and H2O2 for 1 minute, extracted total RNA, and analyzed the RNA by streptavidin dot blot. Figure 1B shows that RNA biotinylation is abolished upon omission of BP or H2O2, or following treatment with RNase. Combined with the assays above, our results suggest that APEX directly tags RNA with biotin, not merely biotinylating proteins co-complexed with RNA.

Next we combined APEX labeling with RT-qPCR analysis of biotinylated RNAs in order to begin assessing the spatial specificity of this approach. We started with the mitochondrial matrix, which we have previously characterized by APEX proteomics(Han et al., 2017; Rhee et al., 2013), and whose transcriptome can be predicted by the sequence of the mitochondrial genome (mtDNA)(Mercer et al., 2011). Using HEK cells expressing APEX2 in the mitochondrial matrix, we performed labeling, then extracted RNA and enriched the biotinylated fraction using streptavidin beads. We optimized a series of denaturing washes to fully dissociate complexes and ensure that the streptavidin beads only enriched biotinylated RNA species (Figure S1F). We then analyzed the eluate by RT-qPCR, and observed strong enrichment of mtDNA-encoded mRNAs MTND1 and MTCO2, but not negative-control cytosolic mRNAs (Figure 1C).

However, because the mitochondrial matrix is enclosed by a tight membrane that is impervious to BP radicals(Rhee et al., 2013), it does not provide a rigorous test of APEX labeling radius. To evaluate spatial specificity in an open cellular compartment, we utilized HEK cells stably expressing APEX2 on the ERM, facing cytosol. RT-qPCR analysis of streptavidin-enriched RNA following BP labeling (Figure 1D) shows high enrichment of secretory mRNAs (ERM-proximal “true positives”), but not negative-control cytosolic mRNAs (encoding non-secretory proteins). This result suggests that APEX biotinylation has nanometer spatial resolution, and is able to distinguish ER-proximal RNAs from cytosolic RNAs only nanometers from the ERM. This result strikingly contrasts with previous observations using APEX-RIP (Kaewsapsak et al., 2017). For a further side-by-side comparison between APEX-seq and APEX-RIP, a total of 8 representative transcripts that are known to localize to the respective landmarks based on previous literature, were investigated by RT-qPCR (Figure S2E). APEX-seq enriched specific, proximal RNAs in open subcellular regions (ERM, nuclear lamina, nucleolus, and OMM), whereas APEX-RIP was unable to do so.

Development and Validation of APEX-seq

Encouraged by the results above, we moved to a more comprehensive analysis by replacing RT-qPCR with transcriptome-wide sequencing. We also created cell lines expressing APEX in nine subcellular locales (Figure 1E, S2A). For each cell line, we verified correct targeting of APEX by performing immunofluorescence staining against organelle markers. To examine APEX activity, we performed BP labeling, fixed, and stained the biotinylated species using neutravidin-Alexa 647. For some locations, the neutravidin pattern overlapped closely with APEX localization (e.g. nucleolus and mitochondrial matrix; Figure 1F), indicating minimal diffusion of biotinylated species. For other locations, the neutravidin signal was more “spread out” than the APEX signal (e.g., ERM and OMM; Figure S2B), suggesting redistribution of biotinylated species during the 1-minute labeling time window(Hung et al., 2016).

To assess the quality of the polyA-selected APEX-seq data (Figure S2CD, Table S2), we first focused on two subcellular compartments that have been extensively mapped: the mitochondrial matrix and the ERM. For the former (Figure 2AB), APEX-seq experiments showed strong enrichment of all 13 mRNAs and the 2 rRNAs encoded by mtDNA (Figure S2FG), while no RNAs encoded by the nuclear genome were highly enriched.

Figure 2: Validation of APEX-seq, including specific orphans from RNA atlas.

Figure 2:

(A) APEX-seq in the mitochondrial matrix. Transcript abundance in experiment plotted against negative control (omit H2O2). All mRNAs and rRNAs encoded by the mitochondrial genome (large blue dots) are enriched by APEX (mean enrichment > 11-fold). FPKM, fragments per kilobase of transcript per million reads. Due to the 100-nt size selection step during RNA extraction, tRNAs were not efficiently recovered.

(B) Scatter plot of transcript abundance in the mitochondrial matrix (MITO.

(C) APEX-seq at the ERM, facing cytosol. Volcano plot showing APEX-catalyzed enrichment of secretory mRNAs (red) over non-secretory mRNAs (black).

(D) Comparison of ERM-enriched RNAs by APEX-seq, proximity-specific ribosome profiling, and ER fractionation-seq.

(E) Transcript abundance (FPKM) analysis of genes enriched by ERM APEX-seq, fractionation-seq, proximity-specific ribosome profiling, and genes unique to the APEX-seq dataset. P-values from a Mann-Whitney U test.

(F) Total number of orphans (blue) generated from APEX-seq RNA datasets, with those validated by further polyA+ fractionation-seq shown in black. The source of most of these RNAs is the RNA atlas, with further contributions from analysis of the ERM and OMM transcriptomes.

(G) APEX-seq yields cleaner results than bulk fractionation RNA-seq. Nucleus APEX-seq fold changes are highly correlated with bulk fractionation RNA-seq when considering non-ER genes (blue). However, fractionation suffers from contamination by ER transcripts (black).

(H) APEX-seq in the cytosol does not recover RNAs coding for mitochondrial proteins, whereas fractionation-seq does. All mRNAs and rRNAs encoded by the mitochondrial genome are shown in blue. P-value is from a Mann-Whitney U test.

(I) (K) Sequential smFISH imaging of OMM (I) or ERM (K) orphans in HEK cells. MTND5 was used as a mitochondrial marker. SCD and TSPAN3 were used as ERM markers. mRNAs and IncRNAs not enriched in OMM (I) or ERM (K) were used as negative controls. Expanded views of the boxed region are shown on the right. Scale bar, 5µm.

(J) (L) Quantitation of OMM (J) or ERM (L) orphans colocalization with MTND5 (J) or SCD (L) by sequential smFISH imaging. Blue line represent mean from 14 independent fields of view. Data were analyzed using a two-tailed Student’s t test, with *p < 0.05, **p < 0.01, and ***p < 0.001; N.S., not significant (p > 0.05).

For the ERM, APEX-seq highly enriched RNAs previously shown to be ER proximal (such as mRNAs encoding secreted proteins) over cytosol-localized RNAs. To perform a quantitative analysis, we used ROC cut-off analysis(Linden, 2006) (Figure S2IJ) to produce a list of 1077 ERM-enriched RNAs (Figure 2C). To evaluate the specificity of this dataset, we determined the fraction of “secretory” or “transmembrane” mRNAs (Methods), and found that 90% of genes had such prior annotations. The remaining 10% (107 genes) could be false-positives, or they could be newly-discovered ERM-associated RNAs.

To evaluate depth of coverage, we prepared a hand-curated list of 71 well-established ER-resident proteins, and asked what fraction of their corresponding mRNAs appear in our ERM APEX-seq dataset. We recovered 70% of this true-positive list (Figure S2K). This sensitivity is comparable to that of our previous APEX proteomic datasets in open compartments(Hung et al., 2017) (Figure S2L). RNAs we failed to enrich could be sterically-shielded in the live cell environment, low in abundance, or dual-localized to both ERM and cytosol.

The ERM-associated transcriptome has previously been studied by fractionation-seq(Reid and Nicchitta, 2012) and proximity-specific ribosome profiling(Jan et al., 2014). Upon analyzing the published datasets, we found that the specificities of fractionation-seq and APEX-seq were comparably high (90% versus 91% secretory mRNAs, respectively; Figure S2I), in addition to sensitivity (Figure S2K). However, Figure 2D shows that each method recovers somewhat different subsets of transcripts. Further analysis of genes enriched by APEX-seq but not fractionation-seq or ribosome profiling show that many of these are lower in RNA abundance (Figure 2E).

Altogether, our APEX-seq analysis demonstrates that high specificity and reasonable sensitivity can be achieved in both membrane-enclosed and open subcellular compartments.

RNA Atlas of 9 Distinct Subcellular Compartments by APEX-seq

Having established the specificity and sensitivity of APEX-seq using the mitochondrial matrix and ERM, we turned our attention to the seven other compartments (Figure 1E). The RNA content of most of these regions has not previously been mapped, as they are impossible to purify and/or too small to image unambiguously by conventional microscopy. As such, it is impossible to generate true positive/false positive lists of known resident/non-resident RNAs with which to perform ROC-based cut-off analysis. We therefore opted for a universal enrichment-factor cut-off of 0.75 (log2fold-change) and q-value (FDR-adjusted p-value) cut-off of 0.05 which was applied to all compartments (Methods). By intersecting data from each pair of replicates, we obtained RNA lists for all nine compartments (Table S3).

These lists provide a wealth of observations about the RNA composition of diverse cellular locales. Many RNAs are “orphans”, never previously linked to the compartment to which APEX-seq assigns them. For instance, our APEX-seq atlas (Figure 2F) newly assigns 324 RNAs to the nucleolus, 114 RNAs to the lamina, and 111 RNAs to the OMM. To provide further confidence in these spatial assignments, we analyzed a subset of high-abundance RNAs by sequential smFISH imaging (Figure 2IL), and found that 6 out of 10 OMM orphans and 7 out the 8 ERM orphans displayed significant smFISH enrichment at the mitochondria and ERM respectively.

To further validate nuclear and cytosolic RNAs enriched by APEX-seq, we performed polyA+ nuclear/cytosolic fractionation of matched HEK cells (Figure 2G). Of the 95 nuclear and 14 cytosolic APEX-seq orphans for which we could obtain high-quality fractionation-seq reads, 84 of the nuclear and 4 of the cytosolic RNAs were validated (Figure 2F). Overall, fractionation-seq validated 81% (N = 88/109) of orphan genes.

The availability of matched fractionation-seq datasets gives us the opportunity to compare head-to-head with APEX-seq. Overall, we found that both nuclear and cytosolic APEX-seq datasets were much more specific than our corresponding fractionation-seq data. For instance, our APEX-seq gene lists lacked the mitochondrial matrix and ER contaminants present in the cytosolic and nuclear fractionation data respectively (Figure 2GH, S3F). Excluding ER transcripts in the nuclear fractionation-seq dataset (using ERM APEX-seq gene list), we compared the remaining genes to APEX-seq in order to estimate the accuracy (94%) and precision (96%) of our methodology. We also observed that the RNA length distributions in nuclear fractionation and APEX-seq are very similar (Figure S3E).

General Features of the Human Transcriptome Revealed by APEX-seq RNA Atlas

Our APEX-seq atlas reveals interesting patterns and features for the human transcriptome (Figure 3A). For >3200 RNAs, we obtained high enrichment scores (log2foldchange > 0.75) in at least one of the nine locations. Unbiased clustering analysis revealed that RNAs broadly partition into four general localization categories (Figure 3A, 3D): (1) nuclear, (2) mitochondrial membrane/ER, (3) cytosol, and (4) the remaining (which includes ER lumen, mitochondrial matrix, and nuclear pore). Most transcripts further localized to just one or two locations within each category (Figure 3D, 3G, Methods). Comparing mRNAs to long noncoding RNAs (lncRNAs) (Figure 3EF), our dataset showed that the former mostly localize to one of the cytosolic or nuclear locations, while lncRNAs are predominantly nuclear, consistent with previous studies(Cabili et al., 2015)).

Figure 3: Analysis of subcellular transcriptome maps.

Figure 3:

(A) T-distributed stochastic neighbor embedding (t-SNE) plot showing separation and clustering of APEX-seq libraries.

(B) Genome tracks for XIST, a nuclear non-coding RNA, and

(C) IARS2, an mRNA encoding a mitochondrial tRNA synthetase. For each location, the reads were averaged across two APEX-seq replicates. The control tracks were generated by averaging 18 controls from all 9 constructs.

(D) Heatmap of transcripts enriched by APEX-seq showing clustering of the genes that specifically localize to at least one location, and have fold-change data from all locations.

(E) Heatmap showing the APEX-seq fold changes for the mRNA transcripts found to be most variable among the locations investigated.

(F) Heatmap showing the APEX-seq fold changes for non-coding RNAs (excluding pseudogenes) that have the most-variable localization enrichment. A few well-known noncoding RNAs are shown in bold.

(G) Of the ~3250 genes analyzed, most localize to only one or two of the eight locations (excluding mitochondrial matrix) interrogated.

(H) Circos plot showing the co-localization of RNAs to multiple locations.

(I) Transcripts overlapping in multiple locations.

(J) Heatmap showing the protein localization of the transcripts enriched by APEX-seq.

We observed substantial overlap between OMM and ERM-associated transcriptomes (Figure 3HI). Using more stringent cut-offs based on ROC analysis, we confirmed that two-thirds of RNAs are shared by OMM and ERM, with almost 95% of shared mRNAs encoding secreted proteins (Figure S3CD). It may be that specific subsets of mRNAs are translated at mitochondria-ER contact sites (Friedman et al., 2011; Giacomello and Pellegrini, 2016; Valm et al., 2017).

We used our APEX-seq atlas to explore the relationship between protein localization and localization of its encoding mRNA, making use of existing data on protein subcellular localization(Thul et al., 2017). Our analysis (Figure 3J) reveals remarkable concordance between RNA and protein localization at steady state. For example, the ERM-proximal transcriptome preferentially codes for proteins that localize to the ER, Golgi and vesicles, rather than proteins that localize to the nucleus, nucleolus or cytosol. Less expectedly, our data also show that mRNAs enriched in nuclear locations tend to code for proteins enriched in nuclear speckles and nucleoplasm, but not the plasma membrane (Figure 3J, S3AB). This result is surprising if protein translation occurs exclusively in the cytosol. Alternatively, it has been suggested that mRNAs in the nucleus might serve as “reserve pools” that help to dampen gene-expression noise(Bahar Halpern et al., 2015; Battich et al., 2015; Hansen et al., 2018). We speculate that nuclear-destined proteins(Thul et al., 2017), which are highly-enriched for nucleic-acid binding proteins (FDR < 5 × 10−13, GO-biological process) whose concentrations may have to be precisely tuned, may have mRNAs that are retained in nuclear subcompartments in order to better shield the amount of mRNA available for translation from noise.

The ability of our atlas to position endogenous RNAs with respect to distinct subcellular landmarks provides an exciting opportunity to test novel hypotheses concerning the relationship between RNA localization and function. For example, the atlas shows that XIST (X-inactive specific transcript), a nuclear lncRNA, is enriched at the nuclear lamina but not the nearby nuclear pore (Figure 3B). These findings are consistent with the known role of XIST in coating the inactive X chromosome in female cells(Penny et al., 1996), leading to transcriptional silencing and localization of the inactive X to the nuclear lamina(Chen et al., 2016). Another example is IARS2 (mitochondrial isoleucyl tRNA synthetase 2, encoded by the nuclear genome), whose mRNA was identified by APEX-seq at the OMM (Figure 3C). Because IARS2’s protein product is known to reside in the mitochondrial matrix, the APEX-seq data suggest local translation of the mRNA at the OMM, a point we further explore in Figure 67.

Figure 6: Distinct subpopulations of mRNAs at the OMM.

Figure 6:

(A) Schematic diagram showing the mitochondria with all perturbations described, including those that affect ribosomes (puromycin (PUR) and cycloheximide (CHX)), mitochondria (carbonyl cyanide m-chlorophenyl hydrazone (CCCP)) and microtubules (nocodazole (NOC)). RNA is shown in blue, ribosomes in grey, and microtubules in green.

(B) Gene density distribution of OMM APEX-seq enrichment under different conditions. P-values are from Mann-Whitney U tests.

(C) Gene density distribution of ERM APEX-seq enrichment. Genes are categorized as in P-value is from a Mann-Whitney U test.

(D) Scatter plot of OMM APEX-seq log2fold-change comparing the basal and CHX conditions.

(E) Cumulative fraction of genes in different conditions by TargetP values. CHX treatment shows increased OMM targeting of genes with high Target P values. Genes are categorized by their TargetP values (see Methods) on a scale from 5 (strongest N-terminal mitochondrial targeting peptide) to 0 (no N-terminal mitochondrial targeting peptide). P-values from KS test.

(F) Comparing the proportion of transcripts with different TargetP values and average TargetP value among top 100 mitochondrial genes enriched by OMM APEX-seq in cells under different conditions, and all MitoCarta genes.

(G) Comparing the proportion of transcripts in different functional classes among top 100 mitochondrial genes enriched by OMM APEX-seq in cells under different conditions, and all MitoCarta genes. Genes are functionally classified according to Gene Ontology.

(H) Model summarizing two distinct subpopulations of mitochondrial RNAs proximal to mitochondria.

(I) Browser tracks of a mitochondrial gene (HSPA9, targetP = 5) show increased enrichment by OMM-APEX upon CHX treatment.

(J) Cumulative fraction of OXPHOS and mitoribosome related genes in different conditions. P-values from KS test.

(K) Scheme illustrating the coordinated assembly of respiratory chain complexes and mitoribosomes between the nuclear and mitochondrial genomes.

(L) Browser tracks of a mitochondrial ribosomal gene (MRPS18B) that show increased enrichment by OMM-APEX upon PUR/CCCP treatment.

(M) Heatmap of fold changes for transcripts enriched by OMM APEX-seq. Upon clustering based on the basal, CHX and PUR conditions, we obtain clusters that are either strongly enriched or depleted in the corresponding mitochondrial proteins.

Figure 7: Features of ribosome-dependent and RNA-dependent transcripts at OMM.

Figure 7:

(A) Based on the effect of PUR and CHX, we binned genes from heatmap (Figure 6M) into two categories: ribosome-dependent and RNA-dependent.

(B) ROC curves from an unsupervised random-forest classifier that predicts transcript localization to OMM (versus ERM). To train the classifier, the transcript sequences were divided into 4096 (= 46) 6-mers. Plotted is the mean performance (dark line) and the range from 10-fold cross validation.

(C) Same as (B), but using the first 100 coding amino acids (aa) for training. Due to the much larger possible space of aa-variation, we used 3-mers (=223 k-mers) instead of 6-mers for training.

(D) Similar model using 6-mer RNA sequences was used to classify transcripts as ribosome-dependent or RNA-dependent.

(E) Using the polyA SVM package, which predicts polyadenylation site scores, we find the RNA-dependent transcripts have low polyadenylation scores.

(F) Using a polyA tail-length dataset(Subtelny et al., 2014), we found RNA-dependent transcripts have shorter polyA-tail length relative to ribosome-dependent transcripts. P-values from Mann-Whitney U test.

(G) Correlation of fold change upon 30-minute NOC treatment (where effect saturates) and the corresponding change upon PUR treatment. Changes are measured relative to basal conditions.

(H) Schematic diagram of the time-course APEX-seq protocol.

(I) Number of transcripts enriched by OMM-APEX-seq.

(J) Progressive depletion of basal OMM transcripts upon NOC treatment.

(K) Heatmap of genes enriched by APEX-seq in any of the time points). We clustered on the first 4 times points.

(L) Enrichment change as function of NOC treatment time for the three major clusters. Data are median fold change ± 1 sigma.

(M) Half-lives for transcripts in Cluster 2.

Two other RNAs of note are TUG1 and NORAD, lncRNAs localized by APEX-seq to both the ERM (validated by smFISH imaging in Figure 2KL) and the nucleus. While the majority (97%) of ERM APEX-seq-enriched species are mRNAs, our dataset highlights 31 noncoding RNAs, which are impossible to detect by ribosome profiling or ER fractionation-seq, because they are not translated.

APEX-seq Reveals Differential Localization for Transcript Isoforms

Because APEX-seq is a sequencing-based methodology providing not only gene identity but full sequence details for each enriched RNA, we use it to support the hypothesis that different transcript isoforms of the same gene may localize to different regions of the cell(Mayr, 2017). For example, FUS (fused in sarcoma) mRNA, encoding a nuclear protein implicated in amyotrophic lateral sclerosis (ALS) and phase separation(Patel et al., 2015), shows intron retention within the nuclear locations, but not cytosolic ones (Figure 4A). Dead-BOX helicases 5 (DDX5) and 17 (DDX17) are additional examples of RNAs with retained introns (Figure 4BC). The nuclear enrichment of retained introns was also observed in our fractionation-seq data (r = 0.78) although without the sub-nuclear resolution that APEX-seq provides. Interestingly, we observed that APEX at the nuclear pore enriched fewer transcripts with retained introns than APEX at other nuclear locations (Figure 4D, S4AC), consistent with the role of the pore as a “gene gate” for RNA quality control.

Figure 4: APEX-seq reveals principles related to RNA isoforms and introns.

Figure 4:

(A) (B)(C) The genome tracks of (A) FUS mRNA. (B) and (C) show the genome tracks of two other transcripts, DDX5 and DDX17, with retained introns.

(D) Fractionation-seq (green) and nucleus APEX-seq (red) identify roughly the same genes with retained introns. The nuclear-pore APEX-seq transcriptome has fewer retained introns relative to the nucleus.

(E) Using APEX-seq, we identify transcripts that are highly abundant in both cytosol and nucleus at the gene level, but switch isoforms at the transcript level. TPM, transcript per million.

(F) (G) (H) Browser tracks showing examples of isoform switching across nuclear and cytosolic locations for (F) KAT2A (lysine histone acetyltransferase 2A) in a putative coding sequence (CDS), (G) NCBP3 (nuclear cap-binding protein subunit 3) in the 3′ UTR and (H) HNRNPU (heterogenous nuclear ribonucleoprotein U) in the 5′ UTR respectively. Arrows indicate direction of transcription.

(I) Number of m6A present per transcript enriched by APEX-seq. High-confidence m6A sites were obtained from the literature(Meyer et al., 2012). P-values are from a Fisher’s exact test.

(J) Cumulative distribution of the introns length for genes enriched by APEX-seq in the nuclear locations.

(K) Barplots of average length of nuclear pore and nucleus enriched transcripts by mature transcript length, 5′ UTR, CDS (coding sequence) and 3′ UTR. P-values are from a one-sided Mann-Whitney U test. Errors are standard error of mean.

In addition to retained introns, APEX-seq revealed a group of RNAs that show no gene-level subcellular localization differences but exhibit substantial spatial heterogeneity at the transcript-isoform level (“isoform switching”; Figure 4E, S4AE). Two such examples are the mRNAs for the oncogene AKT2 and the circadian rhythm gene CSNK1D, which show isoform switching between the nucleus and cytosol. In some cases, isoform switching extends to the 5’ UTR, 3’ UTR and coding regions of transcripts (Figure 4FH). Overall we find hundreds of genes with alternative 5’ and 3’ splice sites (Figure S4FG). These results naturally nominate specific exons associated with each isoform for localization to specific subcellular locations, which in turn could affect downstream functions(Berkovits and Mayr, 2015).

Nuclear Pore as a Staging Area for RNA Export

RNA transcripts must pass through the nuclear pore to go from their production sites in the nucleus into the cytoplasm. Previous studies have suggested that the nuclear pore may act as a staging area for cytoplasm-destined transcripts(Wickramasinghe and Laskey, 2015). Our APEX-seq data reveal a striking similarity between RNAs enriched at the nuclear face of the nuclear pore (where APEX is expressed as a fusion to the pore-basket-binder SENP2 (Sentrin-specific protease 2)(Walther et al., 2001) and RNAs in the cytoplasm (Figure 3D), in contrast to RNAs from other nuclear locations (Figure 3A).

Our results support the prevailing view that the nucleoplasmic milieu(Blobel, 1985; Brown and Silver, 2007; Kim et al., 2018) of the pore has a critical role in mRNA surveillance, allowing only properly spliced and sorted transcripts ready for export to cytoplasm to congregate (while retaining partially-spliced transcripts in the nucleus) (Figure 4AC).

m6A Modification and RNA Length in Nuclear Pore Localization

While RNA processing for nuclear export is complex and highly-regulated, the rate-limiting step for mRNA transport is believed to be access to and release from the nuclear pore complex (NPC)(Grunwald and Singer, 2010; Ma et al., 2013). N6-methyladenosine (m6A) modification of pre-messenger RNAs has been reported as a “fast track” signal for nuclear export(Roundtree et al., 2017), while RNA length has been hypothesized as a feature influencing RNA export, with long RNAs taking more time to remodel and exit.

When we intersected nuclear-pore APEX-seq data with m6A modification sites(Meyer et al., 2012), we found a significant depletion of m6A in transcripts enriched near the pore, compared to nuclear lamina or the cytosol (Figure 4I). Our data support the hypothesis that m6A-modified transcripts transit quickly through the NPC, leading to low biotinylation by APEX-seq. However, although transcripts at the pore had less m6A than other nuclear locations, the transcript density of m6A was not significantly-different across these locations. Nonetheless, transcripts at both the pore and other nuclear locations had lower m6A density (i.e. sites per kilobase) than the cytosol.

We also examined RNA length in our nuclear-pore APEX-seq data. We found that transcripts enriched at the pore tend to be shorter than transcripts at other nuclear locations. This inverse relationship between RNA length and nuclear pore APEX enrichment is significant both in the mature transcript and the introns only (Figure 4JK, S4H, S4K). For protein-coding transcripts, the 3’-UTR length is most predictive of nuclear pore APEX-seq enrichment (Figure 4K). A possible interpretation of our data is that longer RNAs pass more quickly through the pore, leading to lower APEX-seq enrichment, which could be the case if shorter RNAs assemble with fewer RNA-binding proteins (RBPs), including those necessary for recognition and passage through the pore.

Although different processes exist to export intronless mRNAs(Delaleau and Borden, 2015), we did not observe a significant difference in the proportion of intronless transcripts at the pore relative to other locations (Figure S4I).

RNA Repeats and Genomic Position Influence Sub-Nuclear RNA Localization

Repeat sequences make up a majority of the human genome(de Koning et al., 2011), with interspersed nuclear elements SINE (short) and LINE (long) containing retrotransposable (transposable via RNA intermediates) elements that can be deleterious when active and randomly moving to new genomic sites(Ichiyanagi, 2013). We observed enrichment of SINEs and LINEs within the nuclear locations (Figure 5A, S5AD), with the highest enrichment of these elements in the nuclear lamina. The cytosolic locations and the nuclear pore showed no enrichment (Figure S5E). Given the known accumulation of transcription-repression machinery at the lamina(van Steensel and Belmont, 2017), our observations may help to explain the recent findings that LINE/L1 elements are epigenetically silenced(Padeken et al., 2015). Likewise, transcripts enriched at the nuclear lamina had lower expression level than other nuclear locations, consistent with the idea of heterochromatin deposition and gene silencing at lamina-associated domains (LADs) (Figure 5C, S5GI).

Figure 5: The underlying features of nuclear RNA localization.

Figure 5:

(A) Examination of retrotransposable elements in transcripts uniquely localizing to different locations show an enrichment of these elements in the nuclear-lamina transcriptome.

(B) Heatmap of z-score showing that transcripts localizing to the nucleolus are enriched in rRNA repeat motifs, relative to the nucleus.

(C) Within the nuclear locations, the nuclear-lamina-enriched transcripts have a lower abundance relative to both the nucleus and the nucleolus. P-value is from a Mann-Whitney U test.

(D) Examination of the genes found in DNA lamina-associated domains (LADs) and nucleolus-associated domains (NADs) confirms that the corresponding transcriptomes are enriched for those genes. Here we restrict analysis to transcripts uniquely-enriched in the respective locations. P-values are from Fisher’s exact tests.

Second, location of the DNA locus from which an RNA originates is believed to strongly dictate nuclear RNA location (Dekker et al., 2017), which we find support for. For example, previous work has shown that the nucleolus is enriched for DNA coding for ribosomal RNAs (rRNAs)(van Koningsbruggen et al., 2010), while our APEX-seq atlas shows that rRNA repeat motifs(Wheeler et al., 2013) are highly enriched in the nucleolus, but far less in the nuclear lamina or cytosol (Figure 5B). We also find that mRNA of genes residing in DNA nucleolus-associated domains (NADs)(Dillinger et al., 2017; van Koningsbruggen et al., 2010) are highly enriched in the nucleolus (odds ratio = 4.4; 95% confidence interval (CI) = 1.7–14) (Figure 5D, S5J). For DNA loci in LADs(Guelen et al., 2008), their corresponding RNA were enriched in the lamina APEX-seq (odds ratio = 11; 95% CI = 3.8–43) (Figure S5JM).

Distinct Mechanisms of mRNA Localization to the Outer Mitochondrial Membrane

Human mitochondrion contains >1100 protein species(Calvo et al., 2015), only 13 of which are encoded by mtDNA and translated within the organelle. The remainder are encoded by the nuclear genome and must be delivered to the mitochondrion after translation in the cytosol(Mercer et al., 2011). The identification of ribosomes at the OMM(Kellems et al., 1974, 1975) led to the hypothesis that some mRNAs encoding mitochondrial proteins may be locally translated at the OMM and co-translationally or post-translationally imported into the mitochondrion(Gold et al., 2017) (Figure 6A). However, at present little is known about the landscape of RNAs at the mammalian mitochondrial membrane, despite its importance for understanding mitochondrial biogenesis.

We mined our APEX-seq atlas for insights about mitochondria-proximal RNAs in human cells, and found that the OMM compartment was enriched in mRNAs encoding mitochondrial proteins (Figure 3J). When plotted by OMM APEX-seq enrichment, we observed a significant enrichment increase of nuclear-encoded mitochondrial genes over non-mitochondrial/non-secretory genes (Figure 6B; Table S4). By contrast, no enrichment increase of mitochondrial genes was observed when RNAs were plotted by ERM APEX-seq enrichment (Figure 6C). These results support the notion that mitochondrial transcripts accumulate at the OMM, possibly for the purpose of local protein translation. Examination of our OMM-enriched mRNAs did not reveal any pattern in terms of protein functional class or sub-mitochondrial localization of the encoded proteins. In an effort to further tease apart possible mRNA subpopulations that may be targeted to the OMM by different mechanisms, we repeated APEX-seq labeling under different perturbation conditions.

Taking advantage of the rapidity of APEX-seq tagging, we treated cells expressing OMM-APEX2 with cycloheximide (CHX), puromycin (PUR), or carbonyl cyanide m-chlorophenyl hydrazone (CCCP), prior to labeling (Figure S6A). CHX and PUR are both protein translation inhibitors but they work by different mechanisms. CHX stalls translation but preserves the mRNA-ribosome-nascent protein chain complex, while PUR dissociates mRNAs from ribosomes. CCCP abolishes the mitochondrial membrane potential and thereby stops membrane potential-dependent processes including TOM (translocase of outer membrane)/TIM-mediated import of mitochondrial proteins(Chacinska et al., 2009).

After treatment of cells with CHX, we observed a dramatic increase in the number of mitochondrial genes and their extent of OMM enrichment (Figure 6B, 6D), consistent with a model in which mRNA localization to the OMM can be regulated by the encoded protein’s mitochondria-targeting sequence. As it emerges from the ribosome, the nascent peptide is localized to the OMM together with the still translating mRNA. Indeed, we found that the most-CHX-enriched mitochondrial genes have higher TargetP scores on average (Figure 6EG); TargetP is a measure of mitochondrial targeting potential(Emanuelsson et al., 2007). Hence, OMM APEX-seq following CHX appears to highlight a subpopulation of that may localize to the OMM in a ribosome-dependent fashion (Figure 6H). Figure 6I, S6C show the genome tracks of example mRNAs, HSPA9 (mitochondria heat shock protein A9) and MUT (methylmalonyl-coA mutase) respectively, that display increased OMM localization upon CHX treatment.

Treatment of cells with PUR produced a pattern of enrichment distinct from CHX treatment. The vast majority of CHX-enriched mRNAs were no longer observed at the OMM, consistent with the hypothesis that the localization of these transcripts depends on an intact ribosome complex (Figure 6B, S6E). Nonetheless, a subpopulation of mRNAs remained clearly associated with the OMM after PUR; the top OMM-localized genes were not higher in TargetP, in contrast to CHX-enriched genes (Figure 6F). Functional class analysis revealed that PUR-enriched genes have a higher likelihood of encoding mitochondrial ribosome and oxidative phosphorylation (OXPHOS) components (Figure 6G, 6JK, S6F), which are the two complexes that require the coordinated assembly from the nuclear and mitochondrial genomes (Couvillion et al., 2016). Figure 6L, S6D show genome tracks of a representative mitochondrial ribosomal-protein gene, MRPS18B (28S ribosomal protein S18b), and OXPHOS gene, NDUFB9 (NADH:ubiquinone oxidoreductase subunit B9) respectively. The PUR data thus suggest that a subpopulation of mRNAs associates with the OMM in a ribosome- and nascent-chain-independent fashion, perhaps by binding directly to a OMM-localized RNA binding protein (Figure 6H).

Upon treatment with the mitochondrial uncoupler CCCP, the genes enriched at the OMM are similar to PUR-enriched genes (Figure 6FG, 6J). CCCP-enriched genes must not depend on the mitochondrial membrane potential or mitochondrial protein import for their OMM localization. Perhaps by causing a reduction in interactions between the ribosome-mRNA-nascent chain complexes and TOM/TIM at the OMM, the association of ribosome-independent mRNAs with the OMM under CCCP becomes more readily apparent.

The availability of basal along with three “drug perturbation” OMM APEX-seq datasets enabled us to perform higher-order clustering analysis. Figure 6M shows transcripts that were enriched at the OMM in at least one condition. We find that RNAs cluster into groups based on their enrichment in CHX versus PUR, with some clusters strongly predictive of genes coding for mitochondrial proteins (Figure S6H). In particular, in clusters 1, 4, and 6 that included transcripts strongly enriched upon CHX treatment and depleted upon PUR treatment, >90% of RNAs (N = 128/140) code for mitochondrial proteins. 7 of the remaining 12 transcripts were pseudogenes, with at least 3 of the 5 mRNAs likely to be mitochondrial (Figure S6IJ) based on other studies(Mou et al., 2009; Pandey et al., 2017; Thul et al., 2017). Thus, OMM APEX-seq data could be used to predict whether certain genes will code for mitochondrial proteins.

Analysis of Motifs that Predict RNA localization to the Mitochondrion

By using PUR and CHX treatments, we disentangled RNA populations that localize to the OMM via ribosome-dependent versus ribosome-independent mechanisms (Figure 6H). We next investigated two hypotheses: (1) that PUR-enriched mRNAs (“ribosome-independent”) possess specific RNA sequences that predict their OMM localization, and (2) that CHX-enriched mRNAs (“ribosome-dependent”) possess specific amino-acid features that predict OMM localization. To test these hypotheses, we first classified OMM-enriched transcripts as either ribosome-dependent or RNA-dependent (if they localized to OMM under PUR) (Figure 7A). We trained a random-forest classification algorithm to predict localization of these two categories of transcripts to the OMM versus the ERM (which we used as “background”), using 6-mers as RNA features (Methods). The resulting classifier was much better at predicting localization of RNA-dependent transcripts relative to ribosome-dependent ones (Figure 7B). The converse result was obtained when using the corresponding N-terminal 100 amino acid peptide for training (Figure 7CD), suggesting that the peptide sequence is more predictive for ribosome-dependent transcripts.

We looked further into the RNA features that may be predictive of OMM localization (Table S5), and found that the 5’ UTR was least important, and the 3’ UTR most informative (Figure S7AB). The most-important 6-mer sequences were G/U rich, with one of the other top hits being the polyA-signal sequence AAUAAA (Figure S7C). In support of our findings, the predicted polyA SVM score (a measure of polyA-site prediction) (Cheng et al., 2006) of RNA-dependent transcripts is substantially different from that of ribosome-dependent transcripts (Figure 7E). We also found that RNA-dependent OMM transcripts have significantly-shorter polyA-tail lengths than ribosome-dependent transcripts, as well as shorter 3’ UTRs (Figure 7F, S7D). Altogether, our findings support the two hypotheses above and reveal specific RNA and protein features that are predictive of OMM localization.

Kinetics of RNA Transport to the Mitochondrion

Previous studies have suggested that RNA may arrive at the OMM via active microtubule-based transport(Buxbaum et al., 2015). To investigate this hypothesis, we repeated the OMM APEX-seq labeling after treating cells for various lengths of time with the microtubule-polymerization inhibitor nocodazole (NOC), which is known to inhibit transport (Reck-Peterson et al., 2018; Shen et al., 2018). We confirmed by imaging that NOC treatment does not perturb the localization of the OMM-APEX2 construct (Figure S7E). Figure 7HI shows that 30 minutes of NOC led to a depletion of mRNAs at the OMM. The RNAs remaining at the OMM were more similar to those observed under PUR (r = 0.72) compared to those under CHX (r = 0.32) (Figure 7G, S7H). The selective disappearance of ribosome-dependent mRNAs from the OMM suggests that these mRNAs may utilize the cytoskeletal network to reach the OMM (Figure 6A).

Analysis of NOC time-course data (Figure 7HI, S7FG, Table S6) showed that the majority of RNAs disappear rapidly from the OMM following NOC treatment. This decrease is observed for both mRNAs that encode mitochondrial proteins, and other RNAs (Figure 7J). Further analysis resolved at least three patterns of responses to NOC (Figure 7KL). The largest cluster shows rapid loss from the OMM with half-life dissociation data that could be fit by a log-normal distribution (Figure 7M), suggesting that many rate-limiting events could be involved. While further studies are needed to characterize these responses (as perturbing the cytoskeleton can have wide-ranging effects), our observations do showcase the power of rapid APEX-seq labeling to resolve dynamic transcriptome-wide RNA localization events.

DISCUSSION

With quantitative enrichment scores and detailed transcript profiles for over 25,000 distinct human RNA species across nine subcellular compartments, our study reveals patterns of RNA localization that give rise to a variety of biological hypotheses. APEX-seq yields RNA sequence information down to single-nucleotide resolution, thereby filling a critical gap in the landscape of RNA technologies. Our APEX-seq-derived atlas of transcriptome localization provides a comprehensive and precise delineations of RNA spatial organization in the living cell.

APEX-seq adds to arsenal of RNA localization methods, and offers unique advantages compared to existing techniques. The first strength of APEX-seq is that labeling is performed in living cells, while membranes and macromolecular complexes are still intact. Second, APEX-seq can be used to analyze “unpurifiable” structures such as the nuclear lamina and OMM that are impossible to access via fractionation-based approaches. The third strength of APEX-seq is that it provides full sequence information for diverse classes of RNA transcripts, allowing transcript isoforms with distinct localization to be distinguished (Figure 4FH). Fourth, while ribosome profiling captures actively translating mRNA on polysomes, APEX-seq additionally detects lncRNAs, antisense RNAs (Figure 3EF) and untranslated mRNAs not bound by ribosomes. Finally, the high spatiotemporal resolution sets APEX-seq apart from APEX-RIP, which loses spatial specificity entirely in non-membrane enclosed regions (Figure 1D).

A disadvantage of APEX-seq is that it requires an APEX-fusion construct to be recombinantly expressed in the cell of interest, which limits applicability to human tissue, for example. Also, APEX-seq does not provide single-cell information like imaging-based methods. Finally, because labeling is performed in live cells, APEX-seq coverage will be fundamentally limited by the steric accessibility of RNAs in their native environment; RNAs that are buried within macromolecular complexes may not be tagged. These limitations suggest directions for future improvement.

We expect that APEX-seq will be broadly applicable to many organisms and cell types, just as APEX proteomics has been extended to flies(Chen et al., 2015a), worms(Reinke et al., 2017), yeast(Hwang and Espenshade, 2016), and neurons(Loh et al., 2016). APEX-seq could be fruitfully applied to polarized cells, neurons, or dynamic developmental systems. Future use of APEX-seq in conjunction with RNA-structure-mapping methods(Chin and Lecuyer, 2017; Spitale et al., 2015; Sun et al., 2019), RBP-occupancy atlases(Van Nostrand et al., 2016), and massively-parallel reporter gene assays(Lubelsky and Ulitsky, 2018; Shukla et al., 2018) could shed light on the molecular basis of the spatial organization of RNA within cells.

METHODS

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Alice Y. Ting (ayting@stanford.edu).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Mammalian cell culture

HEK-293T cells from the ATCC (passages < 25) were cultured in a 1:1 DMEM/MEM mixture (Cellgro) supplemented with 10% fetal bovine serum, 100 units/mL penicillin, and 100 mg/mL streptomycin at 37°C under 5% CO2(Hung et al., 2016). Mycoplasma testing was not performed before experiments. For fluorescence microscopy imaging experiments, cells were grown on 7×7-mm glass coverslips in 48-well plates. For qPCR and RNA-seq experiments, cells were grown on 10-cm glass-bottomed Petri dishes (Corning). To improve the adherence of HEK-293T cells, we pretreated glass slides with 50 μg/mL fibronectin (Millipore) for 20 minutes at 37°C before cell plating and washing three times with Dulbecco’s PBS (DPBS) (pH 7.4).

Generation of HEK-293T cells stably expressing different APEX2 constructs

APEX2 fusion constructs in Figure S2A were cloned into pLX304 vector via Gibson assembly. For preparation of lentiviruses, HEK-293T cells in 6-well plates were transfected at ~60%–70% confluency with the lentiviral vector pLX304 containing the gene of interest (1,000 ng), the lentiviral packaging plasmids dR8.91 (900 ng) and pVSV-G (100 ng), and 8 µL of Lipofectamine 2000 for 4 hour(Hung et al., (2015). About 48 hours after transfection the cell medium containing lentivirus was harvested and filtered through a 0.45-mm filter. HEK-293T cells were then infected at ~50% confluency, followed by selection with 8 mg/mL blasticidin in growth medium for 7 days before further analysis.

METHOD DETAILS

APEX labeling in living cells

18-24 hours after plating HEK-293T cells stably expressing the corresponding APEX2 fusion construct, APEX labeling was initiated by changing the medium to fresh medium containing 500 μM biotin-phenol (Iris Biotech GMBH). This was incubated at 37°C under 5% CO2 for 30 minutes. H2O2 (Sigma Aldrich) was then added to each well to a final concentration of 1 mM, and the plate gently agitated for 1 minute(Hung et al., 2016). The reaction was quenched by replacing the medium with an equal volume of 5 mM Trolox, 10 mM sodium ascorbate and 10 mM sodium azide in Dulbecco’s phosphate-buffered saline (DPBS). Cells were washed with DPBS containing 5 mM Trolox and 10 mM sodium ascorbate three times before proceeding to imaging, RT-qPCR or RNA-seq experiments. The unlabeled controls were processed identically, except that the H2O2 addition step was omitted.

Immunofluorescence staining and fluorescence microscopy

Cells were fixed with 4% paraformaldehyde in PBS at room temperature for 15 minutes. Cells were then washed with PBS three times and permeabilized with cold methanol at −20°C for 5-10 minutes. Cells were washed again three times with PBS and blocked for 1 hour with 3% BSA in PBS (“blocking buffer”) at room temperature. Cells were then incubated with primary antibodies (Mouse anti-V5 antibody, Life Technologies, 1:1000 dilution; Mouse anti-FLAG antibody, Agilent, 1:1000 dilution; Rabbit anti-TOM20 antibody, Santa Cruz Biotechnology, 1:800 dilution; Rabbit anti-Calnexin antibody, Life Technologies, 1:1000 dilution) in blocking buffer for 1 hour at room temperature. After washing three times with PBS, cells were incubated with secondary antibodies (AlexaFluor488, Life Technologies 1:1000 dilution; AlexaFluor568, Life Technologies 1:1000 dilution; neutravidin-AlexaFluor647, Life Technologies, 1:1000 dilution) in blocking buffer for 30 minutes. Cells were then washed three times with PBS and imaged. Fluorescence confocal microscopy was performed with a Zeiss AxioObserver microscope with 60X oil immersion objectives, outfitted with a Yokogawa spinning disk confocal head, Cascade II:512 camera, a Quad-band notch dichroic mirror (405/488/568/647), and 405 (diode), 491 (DPSS), 561 (DPSS) and 640 nm (diode) lasers (all 50 mW). CFP (405 laser excitation, 445/40 emission), Venus/Alexa Fluor488 (491 laser excitation, 528/38 emission), AlexaFluor568 (561 laser excitation, 617/73 emission), and AlexaFluor647 (640 laser excitation, 700/75 emission) and differential interference contrast (DIC) images were acquired through a 60x oil-immersion lens. Acquisition times ranged from 100 to 2,000 milliseconds (ms).

RNA extraction for RT-qPCR or RNA-seq

To labeled and unlabeled (controls) HEK-293T cells in 10-cm plates, we added ~1 mL DPBS containing 5 mM Trolox and 10 mM sodium ascorbate, as well as ~4 uL Ribolock RNase inhibitor (Thermo Fischer). The cells were then scrapped off 10-cm plates using cell lifters (Corning), transferred to 2-mL Eppendorf tubes, and spun at ~300G for 4 minutes to pellet cells. The supernatant was removed, and the RNA was extracted from cells using the RNeasy plus mini kit (Qiagen) following the manufacture protocol, including adding ß-mercaptoethanol to the lysis buffer. The cells were sent through the genomic DNA (gDNA) eliminator column supplied with the kit. A modification to the protocol was replacing the RW1 buffer with RWT buffer (Qiagen) for washing. The extracted RNA was eluted into RNase-free water, and RNA integrity was checked using the Agilent bioanalyzer 2100 using the RNA pico assay. Only RNA with a RIN (RNA integrity number) > 8.5 was used for subsequent experiments. RNAs shorter than 100 nt were not efficiently recovered. RNA concentrations were determined using the Nanodrop (Thermo Fischer).

APEX labeling streptavidin dot blot experiment

RNA from labeled NES-APEX2 (cytosol) HEK-293T cells was treated with Turbo DNase (Thermo Fischer) at 37°C for 30 minute, followed by purification using the RNA clean and concentrator −5 kit (Zymo Research). ~500 ng of purified RNA was blotted on the Amersham Protran 0.45 nitrocellulose (NC) membrane, and the membrane allowed to sit for at least 15 minutes to allow liquid to dry. The RNA was crosslinked to the membrane using 2500 μM energy (254 nm wavelength, UV Stratalinker 2400)(Spitale et al., 2015). The membrane was then wet with ~5 mL PBST (PBS-TWEEN 20), followed by incubation with 15 mL PBST containing 1 μL LI-COR Streptavidin IRDye 800CW (green). The membrane was washed thrice with PBS and imaged on the LI-COR Odyssey CLX. For the RNase digestion, we treated the RNA with RNase cocktail enzyme mix (Ambion) for 30 minute at room temperature (RT), followed by purification using the RNA clean and concentrator kit.

All RNA experiments were carried out using standard protocols to minimize and eliminate RNase contamination. These included using a dedicated work area for RNA, using filtered pipette tips, wiping all surfaces with RNase Zap (Invitrogen), using certified RNase-free buffers and reagents, and testing buffers for RNase contamination using RNase Alert (Ambion). When appropriate, ~1-2 μL of Ribolock RNase inhibitor (Thermo Fischer) was added per 100-200 μL of buffer/solution.

Horseradish peroxidase (HRP) in vitro labeling

For in vitro labeling, 100 μg of yeast tRNA extract (Thermo Fischer) were incubated with 500 μM BP, 1 mM H2O2, and 2.25 μM HRP (Thermo Fischer) in PBS for 1 minute. The reaction was quenched by adding a PBS solution with final concentration of 10 mM sodium azide, 10 mM sodium ascorbate, and 5 mM Trolox. The reaction was cleaned up by the RNA clean and concentrator −5 kit (Zymo Research). For RNA digestion, 15 μg of labeled RNA was incubated with 2.5 μg RNase A (Thermo Fischer) in total volume 25 μL in water. After 1 hour at room temperature, the reaction was cleaned up by RNA clean and concentrator kit. For proteinase K digestion, 15 μg of labeled RNA was incubated with 50 μg of Protease K (Ambion) in a total volume 25 μL in PBS. After 1 hour at 37°C, the reaction was cleaned up by RNA clean and concentrator kit. 1 μg of RNA was spotted for each condition and then dot-blotted as described above.

Enrichment of biotinylated RNA

To enrich biotinylated RNAs we used Pierce streptavidin magnetic beads (Thermo Fischer), using 10 μl beads per 25 μg of RNA. In general, RNA from ½ 10-cm plate (~30-50 μg) was sufficient for generating high-quality polyA+ RNA-seq libraries. The beads were washed 3 times in B&W buffer (5 mM Tris-HCl, pH = 7.5, 0.5 mM EDTA, 1 M NaCl, 0.1% TWEEN 20 (Sigma Aldrich)), followed by 2 times in Solution A (0.1 M NaOH and 0.05 M NaCl), and 1 time in Solution B (0.1 M NaCl). The beads were then suspended in ~100-150 μL 0.1 M NaCl and incubated with ~100-125 μl RNA (diluted in water) on a rotator for 2 hours at 4°C. The beads were then placed on a magnet and the supernatant discarded. Beads were washed 3 times in B&W buffer and resuspended in 54 μl water. A 3X proteinase digestion buffer was made (1.1 mL buffer contained 330 μl 10X PBS pH = 7.4 (Ambion), 330 μL 20% N-Lauryl sarcosine sodium solution (Sigma Aldrich), 66 μL 0.5M EDTA, 16.5 μL 1M dithiothreitol (DTT, Thermo Fischer) and 357.5 μL water). 33 uL of this 3X proteinase buffer was added to the beads along with 10 μl Proteinase K (20 mg/mL, Ambion) and 3 μL Ribolock RNase inhibitor. The beads were then incubated at 42°C for 1 hour, followed by 55°C for 1 hour on a shaker. The RNA was then purified using the RNA clean and concentrator −5 kit (Zymo Research). The RNA was typically not bioanalyzed but used as is for downstream applications.

APEX-seq library preparation

RNA-seq libraries were prepared from enriched RNA (corresponding to ~30-50 μg of pre-enriched RNA) using the Illumina TruSeq stranded mRNA preparation kit, which included polyA+ selection. The prepared libraries were stranded and were quality-controlled by sequencing on the Illumina MiSeq. Good libraries (> 80% unique reads) were sequenced on the Illumina Hiseq 4000 at ~40 million paired (2 × 75) reads per library. The polyA+ libraries (41 ± 2 million paired reads, mean ± S.E.M) had high mapping in both targets (90 ± 1% uniquely mapped reads) and controls (86 ± 1% uniquely mapped reads) (Figure S2C). The correlation between biological replicates was high (between 0.96 and 1). For the MITO APEX-seq we also generated total RNA samples (by omitting the polyA+ selection step in the TruSeq protocol), as well as a polyA+ selected technical replicate for 1 of the labeled samples.

Alternative enrichment strategies tested

We tested alternative strategies to enrich the biotinylated RNAs, and the best one (maximizing enrichment while minimizing material loss) is described above. In general, we found using harsh reagents such as formamide or urea increased yield variability across replicates while reducing yield. We varied temperature (RT versus 4°C), buffers used to wash beads, and amount and type of beads used (Pierce streptavidin beads vs Dyna MyOne Streptavidin C1 beads (Thermo Fischer)). Specifically, the protocols tested were as follows:

  1. Published APEX-RIP protocol/urea wash/high salt wash(Kaewsapsak et al., 2017) – 2 hour 4°C incubation – 10 uL Pierce beads

  2. APEX-RIP protocol (excluding urea) – 2 hour 4°C incubation – 10 uL Pierce beads

  3. APEX-RIP protocol – 10-15 minutes RT incubation – 10 uL Pierce beads

  4. APEX-RIP protocol – 2 hour 4°C incubation – 10 u L Pierce beads – +2 additional washes of 50% formamide for 15 minutes at 37°C

  5. High salt wash (described above, B&W buffer) – 15 minutes at RT – 150 uL Dyna beads

  6. High salt wash – 15 minutes at RT – 10 uL Pierce beads

  7. APEX-RIP protocol – 2 hour 4°C incubation – 50 u L Pierce beads

  8. APEX-RIP protocol – 2 hour 4°C incubation – 10 u L Pierce beads – +2 washes 20% formamide for 15 minutes at 37°C)

  9. High salt wash – 2 hour 4°C incubation – 10 uL Pierce beads (finalized protocol)

  10. High salt wash – with 2 hour 4°C incubation – 10 uL Pierce beads – + 2 M urea wash

  11. No enrichment controls

Formamide was in 1X SSC buffer (Promega). For APEX-RIP protocol, we used RIPA buffer, 1 M KCl and 2M Urea buffers, following the APEX-RIP protocol(Kaewsapsak et al., 2017).

APEX RT-qPCR experiments (MITO)

To test for APEX RNA enrichment, we designed primers against positives (MTND1 and MTCO2) and negatives (GAPDH, SSR2, XIST, FAU). The sequences of the primers (purchased from Elim Biopharmaceuticals) are listed in Table S1.

For the RT-qPCR experiments, the enriched MITO-APEX2 RNA was first reverse transcribed following the Superscript III reverse transcriptase (Thermo Fischer) protocol using random hexamers as primers(Kaewsapsak et al., 2017). The resulting cDNA was then testing using qPCR using the primers above in 2X SYBR Green PCR Master Mix (Thermo Fischer), with data generated on Lightcycler 480 (Roche). For each RNA we calculated the ratio of RNA recovered in the labeled target relative to unlabeled controls. We then calculated enrichment as recovery of positives relative to negatives, correcting for primer efficiency (> 85% for all primers).

APEX RT-qPCR experiments (ERM)

To confirm enrichment of known secretory RNAs by ERM APEX-seq, we designed primers against known secretory (SSR2, TMX1 and SFT2D2) and non-secretory genes (FAU, SUB1, MTCO2). The sequences of the primers (purchased from Sigma Aldrich) are listed in Table S1.

APEX-RIP RT-qPCR experiments with ERM-APEX2 stable cells were performed as described previously(Kaewsapsak et al., 2017). Briefly HEK 293T cells stably expressing ERM-APEX2 were incubated with BP for 30 minutes prior to 1-minute H2O2 labeling. 0.1% (v/v, in PBS) formaldehyde with 10 mM ascorbate and 5 mM Trolox was then added for 10 minutes to quench the reaction and crosslink the RNAs to proteins. The crosslinking reaction was terminated by the addition of 125 mM of glycine for 5 minutes. Following cell lysis in RIPA buffer, streptavidin beads were used to enrich the biotinylated material for 2 hours at 4°C. The crosslinked RNAs and proteins were then reverse crosslinked before protein digestion with proteinase K. The subsequently purified RNA was analyzed by RT-qPCR.

RT-PCR of in vitro 5S RNA to map labeling positions

5S RNA was in vitro transcribed as follows. First, a gBlock Gene Fragments (Integrated DNA Technologies) was purchased with the human 5S RNA sequence adjacent to an overhang region (underlined): ATATGCAAGCAACCCAAGTGGTCTACGGCCATACCACCCTGAACGCGCCCGATCTCGTCTGATCTCGGAAGCTAAGCAGGGTCGGGCCTGGTTAGTACTTGGATGGGAGACCGCCTGGGAATACCGGGTGCTGTAGGCTTT

The DNA was amplified by PCR using Phusion High-fidelity DNA polymerase (NEB) and cleaned up using the QIAquick PCR purification kit (Qiagen). The primers used for amplification are listed in Table S1. The DNA template was used to synthesize RNA using the MEGAscript Transcription T7 Kit (Thermo Fischer), and the transcribed RNA was purified using the MEGAclear kit (Ambion). The integrity and size of the transcribed RNA was checked by running the products on a 6% native agarose gel, stained with SYBR Gold (Thermo Fischer), and imaged using Molecular Imaging System (Biorad). All experiments were carried out using replicates. The labeled RNA was enriched using streptavidin-biotin pulldown as described above, and relative to unlabeled RNA the yield after enrichment and cleanup was 0.11 ± 0.02 (N = 2), as determined by the Nanodrop (Thermo Fischer).

We prepared labeled reverse primer following the USB Optikinase protocol (Affymetrix) using gamma-32P ATP (Perkin Elmer). We added 32P end-labeled reverse primer to ~100 ng RNA (labeled and controls), and the reaction mixture was heated at 95°C for 2 minutes followed by slow cool to 4°C at 10°C/mi, to facilitate annealing of primer (1 μL) to the RNA. The primer extension reaction was then initiated with reaction mix, as previously described(Lee et al., 2017). Briefly, the reaction mix (20 μL total; 4 μL 5X first strand buffer (FS), 1 μL 100mM DTT, 1 μL Ribolock inhibitor, 10 μL RNA, and 2 μL water) was added and the mixture preincubated at 52°C for 1 minute before adding Superscript III (1 μL; 200 units, Invitrogen). Separately, similar reactions were carried out spiking in dATP with ddATP (dideoxyadenosine), and dCTP with ddCTP (dideoxycytidine), as described(Lee et al., 2017). Primer extension was carried out at 52°C for 30 minute, after which the reaction was stopped by heating to 95°C for 5 minute, followed by cooling to 4°C. The RNA was then hydrolyzed using NaOH (4M; 1 μL) and heating to 95°C for 5 minutes.

To the cDNA, Gel Loading Buffer II (10 μL; Invitrogen) was added, and the products run on a 35-cm long denaturing 8% polyacrylamide gel with 7M Urea (Thermo Fischer)(Lee et al., 2017). The resulting gel was dried (Labconco Gel Dryer) after placing it on Whatman paper (Sigma Aldrich) and exposed to a storage phosphor screen (Molecular Dynamics) for ~24 hours, and then visualized by phosphorimaging (STORM, Molecular Dynamics). The lanes containing ddATP (corresponding transcribed RNA nucleotide U) and ddCTP (corresponding transcribed RNA nucleotide C) were used to determine the position of all RT stops. RNA secondary structures were predicted using mFOLD software(Zuker, 2003). The gel analysis was carried out using ImageJ(Schneider et al., 2012).

Liquid-chromatography (LC)-mass-spectrometry (MS) characterization of in vitro reaction products

2 mM dG (Sigma Aldrich) was incubated with 100 μM pentachlorophenol (PCP, Sigma Aldrich) or BP in PBS at 37°C for 1 hour in the presence of 2.25 μM HRP or APEX2 and 100 μM H2O2. The reaction was diluted with water to final volume 100 μL and injected into an LC-MS with Zorbax Poroshell 120 SB-C18, 2.1 x 50 mm 2.7 u column with Poroshell 120 SB-C18 2.1 x 5 mm 2.7 u guard column. The gradient for LC is shown in the table below. Mass was determined by single quadruple mass spectrometry with positive and negative atmospheric pressure chemical ionization (APCI) and electrospray ionization (ESI) modes for M/Z = 20-2000.

Time (minute) Flow (mL/minute) % solvent A % solvent B
0 0.3 98 2
2 0.3 98 2
6 0.3 5 95
8 0.3 5 95
8.5 0.3 98 2
9.5 0.3 5 95
10.5 0.3 5 95
11 0.3 98 2
15 0.3 98 2

Solvent A = 0.1% formic acid in water. Solvent B = 0.1% formic acid in acetonitrile.

Mapping and visualizing APEX-seq data

The RNA-seq libraries generated were mapped to the genome rather than to annotated transcriptome, so we could investigate intron retention. The RNA-seq reads were initially subject to barcode removal and primer trimming using a published script(Flynn et al., 2016) based on Trimmomatic(Bolger et al., 2014): (https://github.com/qczhang/icSHAPE/blob/master/scripts/trimming.pl:

perl trimming.pl -1 $fastq-file1 -2 $fastq-file2 -p $trimmed-file1 -q $trimmed-file2 -l 0 -t 0 -c phred33 -a adapter.fa -m 36

The reads were then mapped using STAR(Dobin et al., 2013; Dobin and Gingeras, 2015) to the GRCh38 Ensembl genome, with Homo_sapiens.GRCh38.84.gtf annotations (http://uswest.ensembl.org/index.html), using the following command::

STAR --genomeDir ~/genome/human/star --runThreadN 8 --readFilesIn $trimmed-file1 $trimmed-file2 -- outFileNamePrefix $output-samfile

The mapped reads were then counted using HTSEQ(Anders et al., 2014):

python -m HTSeq.scripts.count -m intersection-nonempty -s reverse -i gene_id $output-samfile ~/genome/human/Homo_sapiens.GRCh38.84.gtf > $txt

The mapped data was visualized using the UCSC browser track(Kent et al., 2002). To generate genome tracks we used samtools(Li et al., 2009) to generate stranded BAM files for each library from the SAM file. The BAM file was then used to generate a bedGraph following the command:

genomeCoverageBed -bg -split -ibam $bam -g ~/genome/human/star/chrNameLength.txt > $bed_file

BedGraph files from multiple replicates were aggregated using bedtools(Quinlan and Hall, 2010) unionbedg and each track was normalized to the same sequencing depth (30 million reads each). The averaged bedGraph files were converted to BigWig files using command bedGraphToBigWig(Kent et al., 2010) for visualization in the UCSC browser(Karolchik et al., 2004). Statistics from the mapped data was aggregated using MultiQC(Ewels et al., 2016). To calculate FPKM (fragments per kilobase per million reads), we obtained transcript lengths from Biomart Ensembl(Durinck et al., 2005) (Ensembl Genes 92, GRCh38), using the longest stable isoform for a gene as its length.

Transcript-level quantification:

Kallisto(Bray et al., 2016) (v 0.43.1) was used to quantify transcript-level abundances of the APEX-seq libraries. A fasta file corresponding to Homo_sapiens.GRCh38.89.gtf and hg38 was downloaded from the Ensembl website and a kallisto index was generated using the kallisto index command with default arguments. To quantify each pair of fastq files, the kallisto quant command with the -b 30 argument was used.

Data analysis using DESeq2

Differentially-expressed genes were determined using DESeq2(Love et al., 2014), using FDR < 0.05 and 18 controls. We tested the effect of imposing other FDR (false discovery rate) cutoffs, and found no appreciable increasing in precision with further decreasing FDR to 0.01. However, increasing FDR beyond 0.05 dramatically decreased precision. All P-values from DESeq2 are FDR adjusted for multiple-hypothesis testing.

For the OMM perturbation experiments, we used the same 18-control strategy, but replacing the unperturbed OMM APEX-seq and cytosol APEX-seq values with the corresponding drug-perturbed values. Using a strategy where we only had 4 controls per perturbation experiments (2 OMM unlabeled and 2 cytosol unlabeled) did not appreciably change the conclusions. For the nocodazole time course experiment we used controls generated at 3 minutes treatment to calculate enrichment values for the 3 minutes and 6 minutes target libraries, and controls generated at 30 minutes treatment for the 9 minute, 30 minute and 2 hour target time points.

There were two exceptions to the 18-control approach in all our analysis. For the LMA gene in nuclear lamina, and the SENP2 gene in the nuclear pore construct, the unlabeled control had high counts for these genes in the RNA-seq data, as these genes were used to target APEX2 to the corresponding location. Therefore, to compensate, we replaced the LMA 18-control DESeq2 log2fold-change with the 2-control log2fold-change (using LMA controls) in the LMA data; we did the same for SENP2 in the nuclear pore dataset. Otherwise, all other data was as is. As the default DESeq2 approach(Love et al., 2014) replaces outliers when there are > 7 replicates, with our 18-control experiment we didn’t need to change the corresponding DESeq2 values of sub-compartments other than nuclear pore and nuclear lamina.

Generating Orphan Lists

The orphan lists in Figure 2F include candidates from 7 locations – ERM, OMM, nucleus, nuclear lamina, nuclear pore, nucleolus and cytosol. We did not find any significantly-enriched transcripts in the KDEL, so no RNAs from this location were included in the orphan list. To generate the ERM and OMM orphan list we started with the 1077 and 1027 enriched transcripts in ERM and OMM respectively, and excluded all secretory genes. To generate the nucleus and cytosol orphan lists we started with the enriched genes from the atlas analysis (Figure 3D), and excluded all genes that were known to be enriched in these locations based on published fractionation-seq data from HEK293 (rRNA-depleted)(Sultan et al., 2014). For the fractionation-seq data the corresponding files were downloaded from the European Nucleotide Archive (accession number PRJEB4197), and processed identically to the APEX-seq fastq files. To validate nucleus and cytosol orphans, we carried out our own nuclear/cytosol fractionation, but using polyA+ selection and making RNA-seq libraries identically to the APEX-seq libraries.

To generate the nucleolus, nuclear pore and nuclear lamina orphan lists, we started with the genes from the atlas analysis and included genes that were highly-enriched in the location (log2foldchange > 0.75) relative to nucleus APEX-seq.

FPKM data sources

To obtain FPKM (reads per kilobase per million reads) of genes, we used two sources. First, we used published polyA+ data(Sultan et al., 2014) by averaging data from two protocols (Qiaquick and Trizol). The corresponding fastq files were downloaded from the European Nucleotide Archive (accession number PRJEB4197), and processed identically to the APEX-seq fastq files. Secondly, we estimated FPKM from the raw counts from our 18-control samples. Briefly we calculated the FPKM for each control sample and averaged the FPKM values from all 18 controls. As the RNA-seq reads from unlabeled samples were obtained after treatment with streptavidin beads, these FPKM values might suffer from sequence- or length-dependent biases. However, genome-wide we found the data to be highly-correlated (r = 0.95) with the published data(Sultan et al., 2014). For the correlation analysis, we considered FPKM > 1. For all analyses that used FPKM values, we also only considered genes with FPKM >1.

ERM APEX-seq extended analysis

We extensively tested the ERM APEX-seq dataset against the published ER fractionation(Reid and Nicchitta, 2012) and ER-proximal ribosome profiling(Jan et al., 2014) datasets. Briefly the ERM APEX-seq log2fold-changes were cytosol normalized by subtracting the corresponding cytosol (NES) log2fold-change. This ratiometric normalization increased specificity for challenging sub-compartments (Figures S2I), and is routinely used for APEX proteomics analysis(Hung et al., 2014). To test the specificity and sensitivity of ERM APEX-seq, we used receiver-operator-curve (ROC)(Linden, 2006).For the true positive list to test against, we considered the ~640 ER-enriched transcripts by ribosome profiling (log2fold-change > 0.904, as determined by authors), and the false positives were non-secretory RNAs – these were genes not in Phobius(Kall et al., 2004), SignalP(Petersen et al., 2011) and TMHMM(Krogh et al., 2001). In conjunction with our data, we also examined ER ribosome profiling data using ROCs, including transcripts with total RPKM > 10, as recommended by authors. From the ROC analysis, the cutoff was the value that maximized the difference between the true positive rate (TPR) and false positive rate (FPR). For ERM APEX-seq we explored different analysis approaches, including using 2-controls (ERM APEX-seq controls), 4-controls (ERM and cytosol APEX-seq controls) and 18-control samples. We also explored including or excluding ratiometric normalization (i.e. cytosol normalization to reduce background). We finally settled on the 18-control condition based on high specificity and reasonably-high coverage (> 1000 enriched genes). For this 18-control experiment, DESeq2 log2fold-change values from cytosol APEX-seq were subtracted from ERM APEX-seq to obtain ratiometric normalization (ERM/cytosol), and a final cutoff of log2fold-change (ERM/cytosol) > 0.8725 was obtained from the ROC. Unlike the published studies(Jan et al., 2014; Reid and Nicchitta, 2012), we didn’t need to use CHX to stabilize transcripts.

Our analysis suggested improved performance (i.e. specificity) when combining replicates, especially when dealing with challenging/open sub-compartments. As few as 4 controls can improve performance. However, combining controls from multiple constructs/experiments is typically unnecessary when dealing with closed sub-compartments such as the nucleus, cytoplasm or mitochondrial matrix.

To compare the abundance of transcripts recovered by ERM APEX-seq, ER fractionation and ER ribosome profiling, we used a published HEK293 polyA+ RNA-seq dataset(Sultan et al., 2014). To estimate the coverage of our ER datasets, we chose a reference gene list comprising of 71 mRNAs that encode ER resident proteins, as previously described(Kaewsapsak et al., 2017). For the comparison of coverage with ERM proteomics, we used the published dataset containing gene names(Hung et al.,(2017).

Nucleus (NLS) APEX-seq extended analysis

To validate the NLS APEX-seq orphans we carried out nuclear fractionation of HEK-293T cells by following an established protocol(Gagnon et al., 2014). The protocol uses a detergent nonidet P40 (NP-40) to separate ER contaminants from the nucleus, and we confirmed this strategy was effective by imaging isolated nuclei, staining for ER using ER tracker Red (BODIPY TR Glibenclamide – Thermo Fischer), and visualizing on fluorescence microscope (Zeiss Observer Z1). We modified the protocol so that the extracted RNA was not purified by trizol extraction, but rather by using the RNeasy plus mini kit (Qiagen). RNA-seq was carried out using the same protocol as that used for APEX-seq, thereby generating polyA+ libraries for the nuclear and cytoplasmic fraction using Illumina TruSeq kit. All experiments were carried out in biological replicates.

To estimate precision and accuracy of NLS APEX-seq, we used our polyA+ fractionation-seq data to generate true-positive and false-positive lists. We did so by first obtaining transcripts differentially expressed in the nuclear relative to cytoplasmic fraction (PFDR-adjusted < 0.05), and did the same for nucleus (NLS) APEX-seq vs cytosol (NES) APEX-seq. We categorized transcripts into the following categories (Figure 2G): (1) log2fold-change NLS > 0 and log2fold-change (nuclear/cytosolic fractionation) > 0 (true positive; TP), (2) log2fold-change NLS > 0 and log2fold-change (nuclear/cytosolic fractionation) < 0 (false positive; FP), (3) log2fold-change NLS < 0 and log2fold-change (nuclear/cytosolic fractionation) > 0 (false negative; FN), (4) log2fold-change NLS < 0 and log2fold-change (nuclear/cytosolic fractionation) < 0 (true negative; TN). Precision was defined as TP/(TP+FP), accuracy as (TP+TN)/(TP+TN+FP+FN), sensitivity as TP/(TP+FN), and specificity as TN/(TN+FP). Transcripts shorter than 100 nt were excluded from analysis. We calculated precision and accuracy for all transcripts, as well as for ER-transcripts (those enriched by ERM APEX-seq) and non-ER transcripts. Other analysis approaches (using other log2foldchange cutoffs, making receiver-operator-curves etc.) did not change the main conclusions.

OMM APEX-seq extended analysis

For OMM APEX-seq data in Figures 67, a similar ratiometric normalization to ERM with cytosol was carried out. Based on extensive testing with ERM APEX-seq and nucleus APEX-seq using the corresponding known gene-lists, we found that in general a default log2fold-change cutoff of 0.75 was suitable for dealing with APEX-seq when prior knowledge was unavailable, or un-assumed. We therefore used log2fold-change (OMM/cytosol) = 0.75 as a cutoff to identify OMM-enriched transcripts.

For labeling secretory RNAs (Figure S3D), we identify and display secretory RNAs in this order first all Phobius, (2) TMHMM but not Phobius, (3) SignalP but not Phobius or TMHMM (4) Gene ontology cellular component (GOCC) but not Phobius, TMHMM or SignalP. Mitocarta2.0 were excluded from secretory RNA (own category).

Mitochondria drug perturbation

For cycloheximide treatment, APEX labeling in OMM-APEX2 stable cells was initiated by changing the medium to fresh medium containing 500 mM biotin-phenol. This was incubated at 3°C under 5% CO2 for 15 minutes. Then cycloheximide (Sigma Aldrich) was added to the medium to a final concentration of 0.1 mg/mL and the cells were further incubated at 37°C under 5% CO2 for another 15 minutes. H2O2 was then added to each sample to a final concentration of 1 mM, and the plate gently agitated for 1 minute. Then the samples were quenched and processed the same way as other APEX-seq samples. For puromycin or CCCP experiment, APEX labeling in OMM-APEX2 stable cells was initiated by changing the medium to fresh medium containing 200 μM puromycin (VWR) (or 40 μM CCCP (Sigma Aldrich)) and 500 mM biotin-phenol. This was incubated at 37°C under 5% CO2 for 30 minutes. H2O2 was then added to each sample to a final concentration of 1 mM, and the plate gently agitated for 1 minute. Then the samples were quenched and processed the same way as other APEX-seq samples.

For the nocodazole experiments, we added 10 μM nocodazole (Sigma Aldrich) to fresh media and incubated cells for 3, 6, 9, 30 and 120 minutes 37°C under 5% CO2, followed by 1 minute labeling by adding H2O2 at room temperature to a final concentration of 1 mM. Biotin-phenol was added to the media 30 minutes prior to labeling. Samples were quenched and processed the same way as other APEX-seq samples.

OMM perturbation data analysis

For gene classification, mitochondrial genes were annotated according to MitoCarta 2.0(Calvo et al., 2015); secretory genes were annotated as in Kaewsapsak et al.(Kaewsapsak et al., 2017) according to Phobius, SignalP, TMHMM, GOCC and ER proximity ribosome profiling. For mitochondrial genes, we adapted the original TargetP algorithm prediction score (0 = no targeting sequence, 1-5 from strongest to weakest) to a different scale (0 = no targeting sequence, 1-5 from weakest to strongest) as a metric of the strength of N-terminus mitochondrial targeting sequence. The mitochondrial gene functional classes were annotated according to Gene Ontology and are listed in Table S4. For the gene density plots in Figure 6BC, all detected transcripts in each condition were plotted by their log2fold-change normalized against their respective cytosol control using a bin size of 0.2. For the functional class analysis in Figure 6FG, the top 100 enriched mitochondrial genes in each experiment were selected based on MitoCarta 2.0 for mitochondrial annotation and log2fold-change of OMM values normalized against their respective cytosol control.

To calculate nocodazole half-lives the non-linear regression function nls in R was used. We excluded transcripts with half-lives shorter than 0.5 minutes, and longer than 60 minutes as these were not reliable. We obtained half-lives for 461 of the 768 transcripts in Cluster 2 (Figure 7K).

Empirical classification of OMM-localized transcripts as ribosome- or RNA-dependent

OMM-enriched transcripts from the heatmap (N = 1902, Figure 6M) who abundance increased with cycloheximide treatment and decreased with puromycin treatment were considered to be ribosome-dependent. In contrast, transcripts whose enrichment at the OMM was largely unchanged following puromycin or cycloheximide treatment were considered to localize in an RNA-dependent manner. The separation of these two populations of RNAs is shown in Figure 7A.

Prediction localization to the OMM versus ERM

If RNA-dependent transcripts localize based on their RNA sequence, and ribosome-dependent transcripts localize based on their protein sequence, a prediction that follows is that the RNA sequences of RNA-dependent transcripts are somehow more distinguishable from non-OMM-localized transcripts than ribosome-dependent transcripts. In other words, if cellular machinery recognizes the sequence of RNA-dependent transcripts and their RNA sequences are sufficient for RNA localization, then the localization to the OMM, as opposed to another cellular destination, should be more predictable based on RNA sequence for these transcripts relative to ribosome-dependent transcripts. We therefore hypothesized that a machine learning classifier would be more able to classify transcripts as being OMM-localized or ERM-localized when comparing ERM-enriched transcripts to ribosome-dependent transcripts, than when comparing ERM-enriched transcripts to protein-dependent transcripts. The ERM was chosen as a dataset to compare with as the ERM and OMM are physically proximate in cells, and both localizations where translation occurs.

To test this hypothesis, we used a random forest model to classify transcripts as being OMM- or ERM-localized. Broadly, training inputs were a list of transcripts and their empirical classification (OMM or ERM), and test inputs were a list of withheld transcripts whose predicted classification (OMM or ERM) was compared with their empirical classification. Model performance was compared when using OMM RNA-dependent transcripts and ERM transcripts, versus OMM ribosome-dependent transcripts and ERM transcripts.

OMM RNA-dependent and ribosome-dependent gene lists were identified as described above. ERM-localized genes were identified based on: log2FC enrichment > 0.75, adjusted p-value < 0.05, and log2FC enrichment in ERM > log2FC enrichment in OMM. Any genes in the ERM list that were also present in the OMM lists were excluded (from both lists), such that only uniquely localized genes were included for classification. The most abundant transcript isoform in the control samples was used as the primary transcript whose sequence was used for downstream analysis. Transcript sequences were converted into kmer counts by 1) generating a list of all possible kmers of a given k, 2) counting the number of times that kmer was present in a given transcript sequence, and 3) normalizing the kmer counts for a given transcript by the length of that transcript (i.e., this results in the relative per-transcript abundance of all possible kmers). For comparisons based on RNA-localization (Figure 7B), a k of 6 was used, resulting in 4096 (4^6) possible kmers. When comparing the sequences of only UTRs or CDS, only transcripts containing a 5’UTR/CDS/3’UTR sequence of at least length 10 were used.

The ensemble.RandomForestClassifier in the Python scikit-learn package (v 0.20.0) was used with default settings, with the exception of: n_estimators=100, max_features=4096, min_samples_split=15, min_samples_leaf=15. 10-fold cross-validation was used. ROC curves were generated using the ensemble.RandomForestClassifier.predict_proba() and metrics.roc_curve() functions in scikit-learn. Mean ROC curves are shown, with shaded areas indicating one standard deviation.

To test the hypothesis that OMM ribosome-dependent transcripts should be more predictive based on their protein sequence than OMM RNA-dependent transcripts, the above procedure was also performed using the protein sequences of ERM-localized, OMM P, and OMM R transcripts. Only protein-coding transcripts were used. As there are a greater number of possible amino acid kmers (22k) than nucleic acid kmers (4k), a k of 3 was used, and only kmers found at least twice across all protein sequences (in any one of the three lists) were included in downstream analyses. The use of a smaller k and/or the greater number of possible amino acid sequences, in addition to potential biochemical similarities between certain sets of amino acids (e.g. hydrophobic amino acids) which may be biologically similar but are not explicitly included in our model, may contribute to lower performance relative to classification based on RNA sequence. As protein-dependent localization would be predicted to be primarily dependent on the N-terminal amino acids, as these are the amino acids displayed during nascent peptide synthesis, only the first 100 N-terminal amino acids were used. Proteins whose sequences were shorter than 40 amino acids were excluded. The restriction to the 100 N-terminal amino acids is consistent with and based on the methods used by other signal peptide prediction programs (i.e., TargetP).

Random forest classification of OMM transcripts as RNA-dependent or ribosome-dependent

To validate the random forest model classification of OMM-localized transcripts as being RNA-dependent or ribosome-dependent (P transcripts), the ensemble.RandomForestClassifier was used as described previously (with the same settings) and 10-fold cross-validation (Figure 7D). Subsequently, to determine relative kmer importances, the entire dataset of RNA-dependent and Ribosome-dependent transcripts (with no transcripts withheld) was used to train the model (using per-transcripts length-normalized 6mer counts of all 4096 6mers). Feature importances were normalized to the maximum feature importance. These 6mer importances were then projected onto transcript sequences for all three gene lists (with overlapping genes not withheld), to identify the relative importance of transcript 6mers as a function of the position along the length of the transcript (i.e., to determine whether there exists a positional bias, such as 5’ or 3’ bias, in the part of the transcript most important for predicting RNA localization). To generate the relative importances, 1) the per-base importance of each transcript was initialized to 0, 2) transcripts were broken into consecutive 6mers, 3) the feature importance of each 6mer was added to the 6 corresponding bases of the RNA transcript, to result in a per-base importance for each transcript. This was then normalized for each transcript to the maximum per-base importance, such that the values for each transcript range from 0 to 1. A sliding average window of 20-bases was used and the resulting importances were then normalized based on transcript length to create the metaplot shown in Figure S7B, such that the position importances for each transcript ranged from 0 (representing the 5’ end of a transcript) to 1 (the 3’ end of a transcript).

PolyA score prediction and polyA-tail length

The polyA_SVM package (v 2.2)(Cheng et al., 2006) was used to compute predicted polyadenylation site scores for all transcripts using default settings. The maximum predicted score for each transcript was used. If no predicted score was returned (i.e., the polyA_SVM package did not predict the presence of a polyA site), then a score of 0 was used. The sequences of the three respective gene lists (ERM-localized, OMM RNA-dependent transcripts, and OMM ribosome-dependent transcripts) generated as described previously were used, and overlapping transcripts (found in both the ERM and OMM lists) were not excluded. The polyA tail length data was obtained from GSE52809(Subtelny et al., 2014).

Correlation and T-distributed stochastic neighbor embedding (t-SNE) analysis

For the 9-location correlation and t-SNE(van der Maaten and Hinton, 2008) analysis we only included genes with average counts per sample greater than 100 across all 36 samples (2 labeled targets and 2 unlabeled controls per location). We excluded genes with transcript length less than 100 nt. The raw counts from HTSEQ were rlog-transformed using DESeq2-normalized counts. Pearson correlation on the transformed counts was carried out using the package corrplot in R, using clustering method “centroid” and order “hclust”. t-SNE analysis was also performed in R. For the targets-only t-SNE we included genes with average counts per sample greater than 1000 across all 18 samples (2 labeled targets per location).

For the OMM drug-treatment experiments, we only included genes average counts greater than 100 per sample across 32 samples (2 labeled targets and 2 labeled controls for the following: OMM_basal, OMM_cycloheximide, OMM_puromycin, OMM_cccp, cytosol_basal, cytosol_cycloheximide, cytosol_puromycin, cytosol_cccp)). All other analyses were carried out identically to the 9-location analysis.

Heatmap and gene-ontology (GO)-term analysis

For the integrated analysis of locations, we excluded MITO APEX-seq as the labeled targets perturbed the entire analysis; we believe this is due to the large enrichment of ~13 mitochondrial mRNAs and 2 rRNAs, which constitute over 50% reads in targets; combined with the relative spatial isolation of the matrix.

To generate reliable gene data for the integrated analysis, we took our APEX-seq enrichment data and imposed the following filtering criteria: (1) excluding transcripts shorter than 100 nt that were not recovered efficiently. (2) Considering transcripts with common gene names (typically HUGO gene nomenclature committee (HGNC) approved names(Maglott et al., 2011; Pruitt et al., 2007)) (3) only including genes that were enriched in at least 1 location (PFDR-adjusted < 0.05 and log2fold-change > 0.75 for at least 1 location); and (4) Genes had log2fold-change data estimated from DESeq2 for all locations (i.e. excluding any genes with NA values in any location). This last filtering step typically excluded low-abundant transcripts that might occasionally show up as enriched in a location but didn’t have sufficient counts in other locations. Such low-abundant transcript-data was typically less reliable. We did not impose any FPKM or counts cutoff in our analysis. Our analysis yielded 3262 genes, shown in Table S3. As the fold changes in this study were calculated relative to unlabeled controls, enrichment by APEX-seq is a proxy for transcript concentration. Thus, the cytosol, which constitutes most of cell, recovers fewer transcripts than expected, as it is not possible to highly concentrate transcripts there.

Heatmaps of this data were generated using pheatmap2 in R, with default settings. From the heatmap, clusters were estimated using hierarchical clustering. The cluster number was checked using a number of approaches including gap statistics(Tibshirani et al., 2001), and the combined cluster range was then explored. The genes belonging to each cluster was then compared to all enriched genes to estimate cellular GO-terms. GO-term analysis was carried out using PANTHER(Mi et al., 2013; Thomas et al., 2003) (http://pantherdb.org/about.jsp), using Fischer’s exact test with FDR multiple test correction. We only consider GO-terms with FDR < 0.005. To construct the heatmaps of most variable mRNAs (Figure 3E), we considered genes with average counts > 1000. For the lncRNA heatmap (Figure 3F), we excluded all genes that were not lncRNAs, processed transcripts and pseudogenes.

We took a similar approach for making heatmaps was used for the OMM analysis with the drug perturbations (puromycin, CCCP, cycloheximide). However, for clustering we did not use CCCP data. For the subsequent GO-term analysis we used the Reactome pathway from PANTHER, with the control set comprising all genes enriched in at least one of the 4 OMM APEX-seq conditions (~1900 genes).

Nuclear-locations m6A modification and length analysis

To examine transcript-length differences across the nuclear locations (nucleus, nucleolus, nuclear pore, nuclear lamina), we filtered genes as described above for the heatmaps, but we excluded the filtering step by common gene names. Our analysis yielded 3288 genes. We obtained transcript lengths from two sources: (1) The longest stable isoform, as obtained from Biomart, Ensembl(Durinck et al., 2005), and (2) the most-abundant isoform across all compartments in our APEX-seq data, as determined by rMATs(Shen et al., 2014). Using the transcript length from either of these databases yielded similar trends and conclusions.

To determine the contribution of 5’ UTR, CDS (coding sequence) and 3’ UTR to the overall transcript-length difference between nucleus- and nuclear-pore enriched transcripts, we considered the most-abundant isoform, but excluded non-coding transcripts (i.e. transcripts with a CDS length = 0).

To calculate the number of m6A sites per transcript, we used a published dataset to obtain high-confidence m6A sites in HEK293(Meyer et al., 2012).

Network analysis

RNA interactions for the 3262 genes were investigated in multiple ways. These include (1) tabulating the overlapping genes using UpSet(Lex et al., 2014) in R; and (2) using circlize(Gu et al., 2014) to investigate the data by comparing which transcripts in each cellular sub-compartment most often had residency in other locations. In all instances the fold change for all genes for all transcripts was binarized to either 1 (log2fold-change > 0.75), else 0.

Lamin-associated domains (LADs) and nucleolus-associated domains analysis

For the LADs and NADs analysis, we aggregated data from the following sources to obtain the relevant associated genomic regions: Guelen et al. (Guelen et al., 2008), Dillinger et al. (Dillinger et al., 2017), Nemeth et al.(Nemeth et al., 2010). We used library ((TxDb.Hsapiens.UCSC.hg18.knownGene) in R, obtained from Bioconductor(Durinck et al., 2005; Gentleman et al., 2004) to get the genes contained within these regions.

Quantification of intron retention and intron switching

rMATS(Shen et al., 2014) was used to quantify intron retention in each location. rMATS was run by comparing the APEX-seq BAM files against all controls using the arguments -t paired, and a GTF file downloaded from Ensembl (Genecode v26, Ensembl 88). Only retained intron events with FDR <= 0.05 were considered using the “RI.MATS.JCEC.txt” output file (using both junction and exon counts).

rMATs (v 4.0.1) was used to quantify the number of isoform-switching genes (Figures S5FG) in the APEX-seq labeled samples relative to all unlabeled controls. The number of significant differentialsplicing events (FDR < 0.05) for each compartment was read from the rMATs JCEC files. To remove noise from low-abundance transcripts, the KDEL labeled sample was used as a filter. Any differential splicing events identified in the KDEL labeled samples were ignored, as previous analysis using DESEQ2 found no significantly-enriched genes at that location. The number of genes containing at least one differential splicing event between a labeled compartment and unlabeled controls are reported for the respective alternative splicing event.

Isoform analysis, including isoform switching

Sleuth(Pimentel et al., 2017) was used to perform differential transcript expression analysis between locations, which were compared against all control samples. For the analyses in identifying isoform switching, to generate gene-level abundances the Bioconductor(Gentleman et al., 2004) tximport package was used to import kallisto(Bray et al., 2016) abundances and aggregate to the gene-level. DESeq2(Love et al., 2014) was subsequently used to perform differential gene expression analysis. Genes that displayed isoform switching were identified as follows: First, using the differential gene expression output from DESeq2, genes displaying no significant differential expression between the nucleus (NLS) and cytosol (NES) samples were identified. For each of these genes, we then determined if there were any transcripts that were significantly enriched in either the nucleus or cytosol samples (as determined by Sleuth(Pimentel et al., 2017)). Genes displaying no differential expression between the nucleus and cytosol, but with at least one transcript enriched in the nucleus and a different transcript enriched in the cytosol, were called as displaying isoform switching. To select genes to display in Figure 5E, an expression cutoff (at the gene level) of log2counts >= 12 and an isoform difference metric >= 10 was set. The isoform difference metric was computed by taking the sum of the absolute values of the log2fold-change enrichments for the most cytosol-biased and most nuclear-biased transcript.

Repeat analysis:

A list of all annotated repeat elements was downloaded from UCSC Table Browser(Karolchik et al., 2004). To determine the relative enrichment of repeat elements in the genes enriched in each location, a set of enriched genes in each location was determined as described previously (log2fold-change >= 0.75, PFDR-adjusted < 0.05). A unique set of genes for each location was determined by removing genes that were enriched in more than one APEX-seq location.

For each set of genes, a corresponding list of genomic coordinates comprising only exonic sequences was generated. The most abundant isoform of each gene was used for determining the coordinates of the exons. This list was then intersected with the repeat annotation using bedtools(Quinlan and Hall, 2010) intersect with the -F .51 command to require that at least half of the repeat annotation was present in an exonic sequence. This was then aggregated by gene to generate a “repeat count” by gene table (with all repeat families as rows, and all genes uniquely enriched in a given location as columns). This table was then binarized to result in a table reporting the presence or absence of a repeat element in each gene. The proportion of genes in each location that contained a given repeat family was then determined. To perform FDR calculations, the gene-location pairings were randomly permuted 1000 times, and the number of permutations in which the resulting enrichment value was at least as great as the observed enrichment value was divided by the total number of permutations.

To quantify the abundance of rRNA repeat elements, the same list of all annotated repeat elements from UCSC(Karolchik et al., 2004) was used (typical size 102 – 103 bp). The number of reads mapping to any rRNA element for each location was determined using bedtools intersect, essentially using the rRNA repeat annotations as a “gene” or “feature” to quantify overall abundance. These values were depth normalized to the total number of aligned reads in each BAM file, and averaged across replicates.

Sequential fluorescence in situ hybridization (FISH) design and analysis

Sequential oligo library for 56 selected genes, which include a combination of known and previously-unknown-location genes, was designed according to Moffitt et al.(Moffitt and Zhuang, 2016) and synthesized by CustomArray. The library was then PCR amplified using Phusion Hot-start Master Mix (NEB, M05365) and then cleaned up using DNA clean and concentrator −5 columns (DCC-5, Zymo Research, D4013). The library was then in-vitro transcribed with T7 polymerase using the HiScribe kit (NEB, E20505) at 37°C overnight. The resulting RNA was then reverse transcribed with Maxima H Minus RT enzyme (ThermoFisher, EP0751) at 50°C for 1 hour and the remaining RNA digested with 0.25 M EDTA and 0.5 M NaOH at 95°C for 10 minutes. The correct size of the reverse transcribed RNA was confirmed by running the product on a 15% urea TBE gel. The probes were further cleaned up using DCC-25 columns (Zymo Research, D4005).

To hybridize FISH probes on to cells, HEK 293T cells were fixed with 4% paraformaldehyde (PFA) in PBS for 10 minutes before permeabilized with 0.5% Triton-X in PBS for 10 minutes. The sample was then incubated in 50% formamide and 0.1% TWEEN 20 in 2X SSC solution for 35 minutes. 500-800 ng/pL of the synthesized probe were added onto the cells using a coverslip and the slides were denatured at 90°C for 10 minutes. Probes were hybridized overnight at 42°C in a humidified chamber. The cells were then washed twice with prewarmed 42°C 2XSSC solution for 10 minutes the next day before imaging on a confocal microscope. A total of 14 fields of view, each with > 20 cells, were imaged for all 56 genes and then the data were processed using MATLAB. Using the FISH images generated for each demultiplexed transcript, we subsequently excluded transcripts that (1) could not be decoded based on the barcode, or (2) that didn’t show any localization (typically low-abundant ones) based on the images being hazy and lacking punctate spots. To carrying out this exclusion in a relatively unbiased manner, we had 3 people (F.M.F., S.H., K.R.P.) independently examine the images for all genes, and rate transcript-localization information as (1) high confidence (2) medium confidence and (3) low/no confidence. We then tallied all these ratings and subsequently excluded transcripts that were assigned a “no confidence” value by any of the 3 people. In general, the 3 people strongly agreed on the confidence ratings. We separately also confirmed that for transcripts with known localization, the discarded genes did not correlate well with known localizations. For imaging quantitation and analysis of each field of view, we generated a mask for each individual gene of interests using a uniform threshold cutoff of 0.5 – 0.998 after removing all the non-cell pixels. The colocalization with MTND3 was calculated by intersecting the mask of a particular gene of interest (for example, “XIST” mask) with MTND3 mask and then divided sum intensity of the intersected mask by the sum intensity of the gene mask of interests. Colocalization with ERM marker SCD was calculated using the same approach. The colocalization results for all 14 fields of view were then calculated to obtain the average and standard deviation.

MITO APEX-seq extended analysis

For the MITO APEX-seq, we obtained strong enrichment (log2fold-change > 2.9) of the 13 MT mRNAs and 2 MT rRNAs in the targets (i.e. labeled libraries) relative to unlabeled controls. These 15 genes made up > 50% of reads in the MITO APEX-seq labeled samples.

In addition to the 15 expected mitochondrial RNAs, we also recovered ~400 transcripts that were moderately enriched (log2fold-change > 0.75), some of which are known mitochondrial pseudogenes. To rule out that this labeling was not because the biotin-phenoxy radical, generated during the labeling experiment, was escaping from the mitochondrial matrix we confirmed that OMM APEX-seq enriched transcripts (> 1000-enriched transcripts) showed no enrichment (average log2fold-change ~0) in the MITO APEX-seq samples. While we do not believe these transcripts to be present in the mitochondrial matrix(Mercer et al., 2011), attempts to confirm the localization by FISH were not successful. We hypothesize two explanations for the observations: (1) Due to the large perturbation introduced by APEX-seq labeling, the DESeq2 analysis does not perform properly; or (2) There is some small background labeling by Cox4-APEX (i.e. MITO-APEX) as the protein makes it way from the cytosol, where it is translated, to the mitochondrial matrix.

NES APEX-seq extended analysis

We did not find many transcripts highly enriched (log2fold-change > 0.75) by cytosol (NES) APEX-seq,. We believe fewer transcripts are recovered because APEX-seq enrichment is a proxy for RNA concentration rather than RNA amount. Thus, the cytosol, which contains a majority of transcripts, recovers fewer highly-enriched transcripts relative to the unlabeled (i.e. whole-cell) controls. Nonetheless we compared the transcripts enriched by cytosol APEX-seq (log2foldchange > 0, PFDR-adjusted < 0.05) and found them to have higher enrichment in the cytosol fractionation fraction relative to transcripts with cytosol APEX-seq log2foldchange < 0.

KDEL APEX-seq extended analysis

For the KDEL construct, DESeq2 did not show any significantly enriched transcripts relative to unlabeled controls, when FDR adjusted. If there are any significantly-enriched transcripts larger than 100 nt, they are few in number. We therefore used the KDEL APEX-seq data as a “negative control” our 8-location integrated analysis by rejecting analysis strategies that yield a large number of enriched KDEL transcripts, as such approaches likely have large false positives. We found that using a log2fold-change between 0.6 and 0.9 was a sufficient compromise to obtain highly-specific gene lists without further loss of coverage (Figure S2M).

Proteomic analysis

For the proteomic analysis, we used subcellular protein localization data from Thul et al.(Thul et al., 2017), using the main location for genes that were imaged using validated and supported antibodies. We filtered the data to exclude duplicate protein localization entries from multiple cell-lines. The majority of the proteins in that data set are nucleoplasmic or cytosolic. We calculated an “enrichment” score for each protein type in each location by carrying out a two-step normalization: (1) obtaining the enrichment of that protein type relative to all proteins, and (2) enrichment of that protein in each location relative to all locations.

Other analysis and data availability

Where appropriate, the following tests were employed (1) Student’s t-test, (2) Mann-Whitney U test (Wilconox rank -sum test) (3) Kolmogorov-Smirnov (KS) test, (4) Fischer’s exact test, (5) hypergeometric distribution test. All tests were carried out in R. All analysis was carried out using R (most plots using ggplot2(Wickham, 2009)), python and Microsoft Excel. All custom code used in this work is available upon request. All sequencing data are available through the Gene Expression Omnibus (GEO) under accession GSE116008. Browser tracks can be found at: http://bit.ly/apex_reviewers_stranded (stranded RNA-seq) and http://bit.ly/apex_reviewers_unstranded (unstranded RNA-seq).

Supplementary Material

1

Table S1. Related to Figure 1:

List of primers used.

6

Table S6. Related to Figure 7:

Analysis of transcripts enriched at OMM using nocodazole (NOC) time-course experiments.

7

Figure S1. Related to Figure 1:

(A) Streptavidin-biotin dot blot analysis of tRNA labeling by horseradish peroxidase (HRP) in vitro. Left: Yeast tRNA extract was incubated for 1 minute with HRP, biotin-phenol and H2O2, after which the tRNA was purified and the resulting product was treated with either proteinase K or RNase A. Right: The products were spotted, and biotinylated species detected via staining with streptavidin.

(B) LC-MS detection of deoxyguanosine (dG)-pentachlorophenol (PCP) adduct resulting from HRP labeling reaction in vitro. dG was incubated for 1 minute with PCP, HRP, and H2O2. The chemical composition of the resulting mixture was analyzed by LC-MS in negative ion detection mode. The left column shows the experiment and the right column shows the negative control in which H2O2 was omitted. Row 1 is the UV trace chromatogram. Row 2 is an enlarged UV trace chromatogram (asterisk denotes the dG-PCP product). Row 3 is the total ion count chromatogram. Rows 4-6 shows mass chromatograms corresponding to the mass of dG-PCP (product), PCP (starting material), and dG (starting material), respectively.

(C) APEX2 catalyzes formation of G-PCP adduct. UV trace chromatograms of guanosine (G) or adenosine (A) reacting with PCP and APEX2 in vitro. G (top row) or A (bottom row) was incubated for 1 minute with PCP, APEX2, and H2O2. The resulting mixture was analyzed by HPLC with UV detection of chemical species. Absorption peaks corresponding to G, PCP and G-PCP adduct are labeled in row 1. Rows 2-5 show negative controls with H2O2, PCP, G or APEX2 omitted, respectively. Row 6 shows the same experiment with A in place of G.

(D) In vitro transcribed 5S ribosomal RNA was treated with HRP in duplicate in the presence of biotin-phenol followed by 1-minute treatment with H2O2 and enriched by streptavidin-biotin pulldown. The enriched RNA was reverse transcribed into cDNA, and the resulting products were run on a denaturing PAGE gel. Modification of 5S RNA at GGG sequences results in excess truncated DNA products (black arrows) relative to controls (no H2O2 added) carried out in duplicate. Based on streptavidin pulldown and quantification of in vitro labeled RNA, we estimate a labeling efficiency of 0.11 ± 0.02 (N = 2).

(E) For 5S RNA, 3 of the 4 GGG sequences interrogated yielded gel bands, presumably due to the RT-enzyme halting at the corresponding biotinylated nucleotides.

(F) Optimization of the washing step following binding of biotinylated RNA to the streptavidin beads. We used buffers containing high salt (1M NaCl) as well as buffers previously used for APEX Proteomics washes with or without urea or formamide. We varied both the amount and type of streptavidin beads used, and the duration and temperature at which incubations of labeled RNA with the beads were carried out. A simple high salt wash provided high enrichment of mitochondrial RNAs (blue bar graph), as determined by enrichment by RT-qPCR of two mitochondrial RNAs - MTND1 and MTCO1 - relative to 4 non-mitochondrial RNAs - XIST, FAU, GAPDH and SSR2. The high-salt wash also maintained higher and more reproducible recovery relative to other conditions. Sources of error arise from both biological replicates and technical replicates.

(G) Bioanalyzer traces of RNA extracted from APEX2-NES HEK cells confirming that the RNA is not degraded (RNA integrity number (R.I.N.) = 9-10) upon treatment with biotin phenol and/or hydrogen peroxide, based on the ribosomal RNAs 18S and 28S remaining intact.

8

Figure S2. Related to Figure 2:

(A) APEX2 fusion constructs employed in this study. FLAG-APEX2-NES uses a nuclear export signal (NES) to localize APEX2 throughout the cytoplasm. Mito-V5-APEX2 employs a 24-amino-acid mitochondrial targeting sequence (MTS) from COX4 to localize APEX2 throughout the mitochondrial matrix. FLAG-OMM-APEX2 employs the C-terminal 31 amino acids of mitochondrial antiviral-signaling protein (MAVS) to target APEX2 to the outer mitochondrial membrane (OMM). ERM-APEX2-V5 employs the transmembrane segment of the endoplasmic reticulum (ER)-resident protein P450 oxidase 2C1 to target APEX2 to the ER membrane (ERM). HRP-V5-KDEL employs a KDEL sequence to target the horseradish peroxidase (HRP) to the ER lumen. V5-APEX2-NLS employs a nuclear localization sequence (NLS) to target APEX2 throughout the entire nucleus. GFP-APEX2-NIK3x employs three tandem nucleolar targeting sequences from NF-ΚB-inducing kinase (NIK) to localize APEX2 to the nucleolus. V5-APEX2-LMNA is targeted to the nuclear lamina by fusing APEX2 to the N terminus of prelamin-A/C (LMNA). V5-APEX2-SENP2 is targeted to the nuclear pore complex by fusing APEX2 to the N terminus of Sentrin-specific protease 2 (SENP2). V5 and FLAG are epitope tags.

(B) Co-localization of APEX expression with neutravidin staining. Left: Fluorescence imaging of APEX2 localization and biotinylation activity. Live-cell biotinylation was performed for 1 minute with BP and H2O2 in HEK cells stably expressing the indicated APEX2 fusion protein. APEX2 expression was visualized by anti-V5/FLAG staining (green). Biotinylated species were visualized by staining with neutravidin-AlexaFluor 647 (red). Scale bars, 10 µm. Right: pixel intensity plot of the dashed line shown in images on the left.

(C) Mapping statistics of APEX-seq libraries generated. Figures show the percentage of mapped reads, as well as the total number of reads. Most polyA-selected libraries showed high proportion (> 80%) of uniquely-mapped reads.

(D) Correlation plot of biological replicates, showing that the unlabeled controls for the different constructs are quite similar to each other, and to the nuclear pore and ER lumen target constructs. The MITO APEX-seq libraries are most different from the other libraries.

(E) RT-qPCR analysis showing specific enrichment of on-target (blue) over off-target (grey) RNAs with APEX-seq, but not APEX-RIP, in different “open” compartments. HEK cells stably expressing the corresponding APEX2 were labeled for 1 minute with BP and H2O2. For APEX-seq, biotinylated RNAs were enriched with streptavidin beads following total RNA extraction and then analyzed by RT-qPCR. For APEX-RIP, RNAs were crosslinked to proteins for 10 minutes before streptavidin beads enrichment. Data are the mean of 3 replicates-± 1 S.D.. The data was normalized such that the mean enrichment of off-target RNAs was 1 for both APEX-seq and APEX-RIP.

(F) The proportion of all reads in the MITO APEX-seq libraries mapping to the 15 mRNAs and rRNAs. Over 10% of all reads map to MTCO1.

(G) Browser tracks of mitochondrial (MT) genome showing robust enrichment of mitochondrial RNAs in the mitochondrial-matrix (MITO) APEX-seq library, but not from the libraries generated from constructs targeting other subcellular locations.

(H) Scatter plot of enrichment for the 15 MT rRNAs and mRNAs between polyA-selected RNA and total RNA, showing good agreement between the two.

(I) ROC curve showing the performance of APEX-seq for different analysis protocols. These include no ratiometric normalization (blue), as well as 2-controls (red), 4-controls (yellow) and 18-controls conditions (green). For challenging open locations, combining controls from other APEX-seq constructs improves performance. For the entire paper, unless otherwise mentioned, 18-controls data is shown. For comparison the ER polysome profiling RNA-seq is shown (purple). Here the true positive is the Jan et al. list(Jan et al., 2014), and false positive is predicted non-secretory proteins(Kaewsapsak et al., 2017).

(J) ERM APEX-seq shows clear separation of true positives (determined by proximity-based ribosome profiling), and negatives (predicted to be non-secretory based on Phobius, SignalP or TMHMM).

(K) Using a list of 71 true-positive ERM transcripts, the coverage of APEX-seq was compared to ribosome profiling and ER fractionation-seq. All three methods yield similar coverage.

(L) Comparing ERM APEX-seq to ERM proteomics, APEX-seq shows higher coverage for these 71 genes.

(M) The proportion of transcripts retained as enriched, as the fold-change cutoff based on APEX-seq comparison of labeled targets versus unlabeled controls is varied. Unless otherwise mentioned, a log2fold-change of 0.75 was used in Figures 25.

9

Figure S3. Related to Figure 3:

(A) Bar plots showing the protein localization of the transcripts enriched by APEX-seq. These numbers are based on the ~3250 genes examined in Figure 3 that have reliable protein localization data in the Protein Cell Atlas database.

(B) Cellular component GO-terms associated with the clusters determined from heatmap in Figure 3D, confirming that the nuclear locations are enriched for nuclear-associated GO terms, and the ER and OMM for membrane GO terms. Size of bubble denotes more significant enrichment/depletion.

(C) Scatter plot comparing the OMM and the ERM APEX-seq log2fold-changes in HEK cells. Genes are categorized as in Figure 6B. Gene names are shown for proteins known to be dual-localized to ER and mitochondria.

(D) Of the mRNAs enriched by both OMM and ERM APEX-seq, more than 90% have secretory annotations.

(E) Histogram showing the length distribution of transcripts recovered from APEX-seq versus fractionation-seq. Both methods yield comparable distributions for the length of transcripts recovered.

(F) Violin plot showing the striking difference in fold changes of ER transcripts between fractionation-seq and APEX-seq. P-value is from a Mann-Whitney U test.

(G) Cumulative distribution of transcripts in the nuclear/cytosolic fractionation-seq data split into two groups based on cytosolic APEX-seq data. Genes enriched by cytosolic APEX-seq (log2fold-change > 0, pFDR-adjusted < 0.05) had much higher enrichment (p <10−100, KS test), in the cytosolic fractionation data relative to genes depleted by cytosolic APEX-seq (log2fold-change > 0, pFDR-adjusted < 0.05). P-values from a KS test.

(H) Mitochondrial APEX-seq shows robust enrichment of the MT rRNAs and mRNAs, and no enrichment of OMM-enriched RNAs. There are ~400 transcripts that have large positive fold changes (log2foldchange > 0.75, pFDR-adjusted < 0.05).

10

Figure S4. Related to Figure 4:

(A) (B) (C) Number of intron-retention events across APEX-seq enriched transcripts in the nucleus and nuclear pore, as well as fractionation-seq. An intron-retention score was calculated based on how much the retained-intron transcript was obtained relative to the corresponding cytosol control, and was computed by taking the sum of the absolute values of the log2fold-change enrichments for the most cytosol-biased and most nuclear-biased transcript.

(D) Scatter plot showing high correlation between intron skipping or retention in nucleus APEX-seq (relative to cytosol APEX-seq) versus nuclear fractionation-seq (relative to cytosol). Genes displaying no differential expression between the nucleus and cytosol, but with at least one transcript enriched in the nucleus and a different transcript enriched in the cytosol, were called as displaying isoform switching.

(E) The genes shown in Figure 4E were identified by selecting for transcripts that are highly abundant and showed high-isoform switching scores.

(F) (G) Barplots showing the number of genes showing alternative splice sites at (F) 5’ UTRs and 3’ UTRs in the APEX-seq samples, relative to unlabeled controls (FDR < 0.05).

(H) (I) Cumulative distributions of the exon length, and number of isoforms for genes enriched by APEX-seq in the nuclear pore relative to other locations. We observe shorter transcripts at the nuclear pore relative to other locations. We see no significant difference in distribution across the locations. Here the transcript length was calculated by considering the most-abundance transcript isoform for each gene across all locations in the APEX-seq data.

(J) (K) (L) Cumulative distribution of the introns length, exon lengths and number of introns for genes enriched by APEX-seq in the nuclear locations. Here the transcript length was calculated by considering the longest-stable transcript isoform for each gene.

11

Figure S5. Related to Figure 5:

(A) (B) (C) (D) Using all the APEX-seq enriched genes as a background, the estimated FDR of finding the nuclear-lamina repeat motifs.

(E) Heatmap showing the number of repeat motifs in exons of transcripts enriched by APEX-seq. Unlike in Figure 5A, this analysis considers all enriched genes, not just enriched genes unique to that location. We continue to see strong enrichment of these motifs in the nuclear lamina, but also in other nuclear locations relative to the cytosolic locations.

(F) Scatter plot showing good correlation between post-enrichment APEX-seq control data and published polyA-selected RNA-seq data from HEK. APEX-seq control data was averaged from 18 controls generated from APEX2 constructs targeting 9 locations.

(G) RNA-seq abundance of all genes (not just unique genes) enriched by APEX-seq. FPKM = fragments per kilobase per million reads. P-values from Mann-Whitney U test.

(H) (I) Using post-enrichment APEX-seq control data we also obtain decreased abundance of nuclear-lamina enriched genes, both for all genes and more strikingly for unique genes. FPKM = fragments per kilobase per million reads. P-values from Mann-Whitney U test.

(J) (K) (L) Bar plots showing the proportion of genes found in lamina-associated domains or nucleolus associated domains or both. P-values are from Fisher’s exact tests.

(M) Control test examining the localization of mitochondrial genes, confirming no similar enrichment of genes in the nucleolus or nuclear-lamina transcriptomes. P-values are from Fisher’s exact tests.

12

Figure S6. Related to Figure 6:

(A) Cluster map of the OMM perturbation experiments, along with the corresponding cytosolic background. All controls cluster together, while the cytosolic locations vary less across the different perturbation experiments relative to their OMM counterparts. OMM cycloheximide is the most different among these labeled libraries.

(B) Molecular function GO-terms based from clusters in Figure 6M. The clusters enriched in mitochondrial genes (1+4+6) show differences relative to clusters 2+3.

(C) Browser tracks of a mitochondrial gene (MUT) show increased enrichment by OMM-APEX upon CHX treatment.

(D) Browser tracks of an OXPHOS gene (NDUFB6) that show increased enrichment by OMM-APEX upon PUR/CCCP treatment.

(E) Scatter plots of OMM APEX-seq log2 fold change comparing the basal and PUR (y axis, left)/CCCP (y axis, right) conditions.

(F) Gene density distribution of OMM APEX-seq log2fold-change under Puro or CCCP condition. Genes are functionally classified according to Gene Ontology.

(G) Plot showing the overlapping number of enriched genes in the different OMM perturbation experiments. All genes that were enriched in at least 1 of the four conditions are included.

(H) The proportion of known mitochondrial genes in the different clusters. Clusters 1+4+6 are highly enriched in mitochondrial genes, while clusters 5+7 are significantly depleted. P-values from Fischer’s exact text.

(I) Examining the transcripts not annotated as mitochondrial in clusters 1+4+6 yields 12 transcripts, of which 7 are pseudogenes and 5 are mRNAs. Of these 5 mRNAs, further literature examination shows evidence for 1 coding for a protein localizing to the mitochondria (TTLL4(Thul et al., 2017)) and 2 localizing to the OMM (ARMCX3(Mou et al., 2009) and EXD2(Hensen et al., 2018)).

(J) Genome tracks of EXD2 from (I).

13

Figure S7. Related to Figure 7:

(A) Using the classifier in Figure 7D, we asked which regions of the transcripts were most important for classifier performance. The 5’ UTR was less important than the 3’ UTR or coding sequence for performance.

(B) Same as (A), but calculating importance along the entire transcript (i.e. normalizing for length) reveals that 3’ UTR is most important for protein- vs RNA-dependent classification.

(C) The 6-mers most important for the classifier performance. Note the polyadenylation signal sequence AAUAAA is one of the most important sequences for prediction.

(D) 3’ UTR length across ERM-dependent, protein-dependent and RNA-dependent transcripts.

(E) OMM-APEX2 remains on mitochondrial in the presence of CCCP or nOc drug perturbation. HEK cells stably expressing OMM-APEX2 were treated with CCCP or NOC for 30 minutes as in the APEX-seq experiments. APEX2 expression was visualized by anti-FLAG staining (green). An antibody against endogenous TOM20 was used as markers for the mitochondria. Scale bars, 10 µm.

(F) Mapping statistics of APEX-seq libraries generated under NOC, PUR, CHX and CCCP perturbation conditions.

(G) Pearson correlation of APEX-seq OMM and cytosol perturbation libraries. The target replicates agree well.

(H) Correlation of fold change upon 30-minute nocodazole treatment (where effect saturates) and the corresponding change upon CHX treatment. Changes are measured relative to basal conditions. These are the 1329 transcripts enriched in any of the time points (0, 3, 6, 9 or 30 minutes).

(I) Molecular function GO-term of clusters. Here the background is the set of genes enriched in any of the time points.

2

Table S2. Related to Figure 2:

Mapping statistics of all APEX-seq and polyA+ fractionation-seq libraries.

3

Table S3. Related to Figure 2:

List of all localized transcripts and all orphans.

4

Table S4. Related to Figure 6:

Analysis of transcripts enriched at OMM with puromycin (PUR), cycloheximide (CHX) and CCCP treatments.

5

Table S5. Related to Figure 7:

List of all ribosome-dependent and RNA-dependent transcripts used to build the random-forest algorithm.

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies
Anti V5 Life Technologies Cat# R96025
Anti FLAG Agilent Cat# 200472
Anti TOM20 Santa Cruz Biotechnology Cat# SC-11415
AlexaFlour488 Life Technologies Cat# A11029
AlexaFluor568 Life Technologies Cat# A11036
AlexaFluor647 Invitrogen Cat# A20006
Neutravidin biotin-binding protein Invitrogen Cat# A2666
Calnexin Thermofischer Cat# PA534754
Streptavidin IRDye 800CW (green) LI-COR Cat# 926-32230
Chemicals, Peptides, and Recombinant Proteins
Sodium ascorbate Sigma Aldrich Cat# A7631-25G
Trolox Sigma Aldrich Cat# 238813-1G
Biotin tyramide Iris Biotech Gmbh Cat# LS3500
Agencourt AMPure XP Beckman Coulter Cat# A63881
Pierce streptavidin magnetic beads Thermo Fischer Scientific Cat# 88816
Lipofectamine 2000 Thermo Fischer Scientific Cat# 11668019
TURBO DNase Thermo Fischer Scientific Cat# AM2238
Fibronectin Millipore Cat# FC010
Superscript III first-strand synthesis system Thermo Fischer Scientific Cat# 18080051
Proteinase K solution (20 mg/ml) Life Technologies Cat# AM2548
RiboLock RNase inhibitor Thermo Scientific Cat# EO0384
Buffer RWT Qiagen Cat# 1067933
DTT (dithiothreitol) Thermo Fisher Scientific Cat# R0861
N-Lauroylsarcosine sodium salt solution Sigma-Aldrich Cat# L7414-10ML
Hydrogen peroxide solution, 30% (w/w) Sigma-Aldrich Cat# H1009-100ML
Nocodazole Sigma-Aldrich Cat# M1404-10MG
Cycloheximide Sigma-Aldrich Cat# 01810-1G
Puromycin VWR Cat# 80054-138
CCCP Sigma-Aldrich Cat# C2759-100MG
Low Range ssRNA Ladder NEB Cat# N0364S
P32 ATP 6000units 250uCi PerkinElmer NEG502Z250UC
Amersham Protran 0.45 nitrocellulose GE Healthcare Cat# 10600033
Critical Commercial Assays
Truseq RNA sample preparation kit v2 Illumina RS-122-2001
RNeasy Plus mini kit Qiagen Cat# 74136
RNA Clean and concentrator-5 Zymo Research Cat# R1016
MEGAscript T7 transcription kit Ambion Cat# AM1334
MEGAclear transcription clean-up kit Thermo Fisher Scientific Cat# AM1908
Miseq reagents kit v3 Illumina Cat# MS-102-3001
Deposited Data
Raw and analyzed data This paper GEO: GSE116008
HEK293 m6A sites (MeRIP-seq) (Meyer et al., 2012) PMID: 22608085
HEK293 ER proximity-specific ribosome profiling RNA-seq (Jan et al., 2014) PMID: 25378630
HEK293 ER fractionation RNA-seq (Reid and Nicchitta, 2015) PMID: 22199352
HEK293 RNA-seq (Sultan et al., 2014) ERA: PRJEB4197
HEK293 APEX-RIP data (Kaewsapsak et al., 2017) PMID: 29239719
HEK293 polyA-tail length (Subtelny et al., 2014) GEO: GSE52809
Protein localization data (Thul et al., 2017) PMID: 28495876
OMM APEX proteomics data (Hung et al., 2017) PMID: 28441135
Dfam database (Wheeler et al., 2013) https://dfam.org/
Human LADs (Guelen et al., 2008) PMID: 18463634
Human NADs (Nemeth et al., 2010) PMID: 20361057
Human NADs (Dillinger et al., 2017) PMID: 28575119
GO enrichment analysis (Thomas et al., 2003) http://www.pantherdb.org/
Human mitochondrial genes (Mitocarta 2.0) (Calvo et al., 2015) PMID: 26450961
Signal peptide annotations (SignalP 4.0) (Petersen et al., 2011) PMID: 21959131
Signal peptide annotations (Phobius) (Kall et al., 2004) PMID: 15111065
Transmembrane protein annotations (TMHMM) (Krogh et al., 2001) http://www.cbs.dtu.dk/services/TMHMM/
Experimental Models: Cell Lines
HEK293T ATCC Cat# CRL-3216
Oligonucleotides
Primers used This study Table S1
5S RNA gDNA block This study Table S1
seqFISH oligos (Chen et al., 2015b) https://github.com/ZhuangLab/MERFISHanalysis
Recombinant DNA
ERM-APEX2 Addgene Plasmid #79055
APEX2-OMM Addgene Plasmid #79056
APEX2-NES Addgene Plasmid #92158
Mito-APEX2 Addgene Plasmid #72480
APEX2-NLS (Kaewsapsak et al., 2017) NotI-V5-APEX2-EcoRI-3xNLS-NheI CMV promoter NLS: DPKKKRKV
HRP-KDEL (Kaewsapsak et al., 2017) NotI-IgK-HRP-V5-KDEL-IRES - puromycin-Xba ICMV promoter IgK is N-terminal signaling sequence that brings protein to ER (METDTLLLWVLLLWVPGSTGD). KDEL is ER-retaining sequence
APEX2-LMNA This study BstBI-V5-APEX2-LMNA-NheICMV promoter LMNA: prelamin-A/C
APEX2-SENP2 This study BstBI-V5-APEX2-SENP2-Nhe ICMV promoter SENP2: Sentrin-specific Protease 2
APEX2-NIK3x This study BstBI-EGFP-APEX2-3xNIK-NheI CMV promoter NIK: Nucleolar targeting sequence from NIK
Software and Algorithms
STAR (Dobin et al., 2013) RRID:SCR_015899
t-SNE (van der Maaten and Hinton, 2008) RRID:SCR_016900
HTSeq (Anders et al., 2014) RRID:SCR_005514
Bioconductor (Gentleman et al., 2004) RRID:SCR_006442
DESeq2 (Love et al., 2014) RRID:SCR_015687
Ggplot2 (Wickham, 2009) RRID:SCR_014601
Barcode trimming (Flynn et al., 2016) PMID: 26766114
ImageJ (Schneider et al., 2012) https://imagej.nih.gov/ij/
SAMtools (Li et al., 2009) PMID: 19505943
Bedtools (Quinlan and Hall, 2010) RRID:SCR_006646
Kallisto (Bray et al., 2016) RRID:SCR_016582
scikit-learn package (Pedregosa et al., 2011) RRID:SCR_002577
PolyA_SVM package (Cheng et al., 2006) PMID: 16870936
MultiQC package (Ewels et al., 2016) RRID:SCR_014982
rMATS (Shen et al., 2014) RRID:SCR_013049
Sleuth (Pimentel et al., 2017) RRID:SCR_016883
DEXSeq (Anders et al., 2012) RRID:SCR_012823
circlize (Gu et al., 2014) RRID:SCR_002141
Other
Detailed APEX-seq library protocol This paper Protocols exchange https://protocolexchange.researchsquare.com/

Highlights.

  1. A transcriptome-wide subcellular RNA atlas was generated by proximity labeling

  2. Isoform-level subcellular localization patterns for over 3200 genes identified

  3. RNA-transcript location correlates with genome architecture and protein localization

  4. Two modes of mRNA localization to the outer mitochondrial membrane uncovered

ACKNOWLEDGMENTS

We thank members of the Ting and Chang labs for analysis suggestions and critical reading of the manuscript; J. Coller and X. Ji for sequencing; and L. Mateo for assistance with microscopy. This work was supported by NIH (R01-CA186568-1 to A.Y.T.; R01-HG004361 and P50-HG007735 to H.Y.C.; S10OD018220 to Stanford Functional Genomics Facility), Burroughs Wellcome (CASI grant to A.N.B.) and the Chan Zuckerberg Biohub (to A.Y.T.). F.M.F. was supported by the NIH T32 SGTP and the Arnold. O. Beckman Postdoctoral Fellowships. S.H. was supported by the Stanford Bio-X Bowes Fellowship. H.Y.C. is an Investigator of HHMI. A. Y. T. is a Chan Zuckerberg Biohub Investigator.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

DECLARATION OF INTERESTS

A.Y.T., P.K., H.Y.C. and F.M.F. have filed a patent application covering aspects of this work (Patent Application Number US 2017/0226561). H.Y.C. is a co-founder and advisor of Accent Therapeutics. H.Y.C. is an advisor of 10X Genomics and Spring Discovery.

Data and materials availability: All data presented are available in the main text and supplementary materials. Browser tracks can be found at: http://bit.ly/apex_reviewers_stranded (stranded RNA-seq) and http://bit.ly/apex_reviewers_unstranded (unstranded). The raw sequencing data were deposited into the Sequence Read Archive (SRA) public database, accessible via Gene Expression Omnibus (GEO) (accession number GSE116008).

REFERENCES

  1. Anders S, Pyl PT, and Huber W (2014). HTSeq - a Python framework to work with high-throughput sequencing data (Cold Spring Harbor Laboratory; ). [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Anders S, Reyes A, and Huber W (2012). Detecting differential usage of exons from RNA-seq data. Genome Res 22, 2008–2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bahar Halpern K, Caspi I, Lemze D, Levy M, Landen S, Elinav E, Ulitsky I, and Itzkovitz S (2015). Nuclear Retention of mRNA in Mammalian Tissues. Cell Reports 13, 2653–2662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Battich N, Stoeger T, and Pelkmans L (2015). Control of transcript variability in single mammalian cells. Cell 163, 1596–1610. [DOI] [PubMed] [Google Scholar]
  5. Berkovits BD, and Mayr C (2015). Alternative 3' UTRs act as scaffolds to regulate membrane protein localization. Nature 522, 363–367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bertrand E, Chartrand P, Schaefer M, Shenoy SM, Singer RH, and Long RM (1998). Localization of ASH1 mRNA particles in living yeast. Mol Cell 2, 437–445. [DOI] [PubMed] [Google Scholar]
  7. Blobel G (1985). Gene gating: a hypothesis. Proc Natl Acad Sci USA 82, 8527–8529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bolger AM, Lohse M, and Usadel B (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bray NL, Pimentel H, Melsted P, and Pachter L (2016). Near-optimal probabilistic RNA-seq quantification. Nat Biotech 34, 525–527. [DOI] [PubMed] [Google Scholar]
  10. Brown CR, and Silver PA (2007). Transcriptional regulation at the nuclear pore complex. Curr Opin Genet Dev 17, 100–106. [DOI] [PubMed] [Google Scholar]
  11. Buxbaum AR, Haimovich G, and Singer RH (2015). In the right place at the right time: visualizing and understanding mRNA localization. Nat Rev Mol Cell Biol 16, 95–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cabili MN, Dunagin MC, McClanahan PD, Biaesch A, Padovan-Merhar O, Regev A, Rinn JL, and Raj A (2015). Localization and abundance analysis of human IncRNAs at single-cell and single-molecule resolution. Genome Biol 16, 20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Calvo SE, Clauser KR, and Mootha VK (2015). MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins. Nuc Acids Res 44, D1251–D1257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chen CK, Blanco M, Jackson C, Aznauryan E, Ollikainen N, Surka C, Chow A, Cerase A, McDonel P, and Guttman M (2016). Xist recruits the X chromosome to the nuclear lamina to enable chromosome-wide silencing. Science 354, 468–472. [DOI] [PubMed] [Google Scholar]
  15. Chen CL, Hu YH, Udeshi ND, Lau TY, Wirtz-Peitz F, He L, Ting AY, Carr SA, and Perrimon N (2015a). Proteomic mapping in live Drosophila tissues using an engineered ascorbate peroxidase. Proc Natl Acad Sci USA 112, 12093–12098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Chen KH, Boettiger AN, Moffitt JR, Wang S, and Zhuang X (2015b). Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090–aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Cheng Y, Miura RM, and Tian B (2006). Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics 22, 2320–2325. [DOI] [PubMed] [Google Scholar]
  18. Chin A, and Lecuyer E (2017). RNA localization: making its way to the center stage. Biochim Biophys Acta 1861, 2956–2970. [DOI] [PubMed] [Google Scholar]
  19. Couvillion MT, Soto IC, Shipkovenska G, and Churchman LS (2016). Synchronized mitochondrial and cytosolic translation programs. Nature 533, 499–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. de Koning AP, Gu W, Castoe TA, Batzer MA, and Pollock DD (2011). Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet 7, e1002384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Dekker J, Belmont AS, Guttman M, Leshyk VO, Lis JT, Lomvardas S, Mirny LA, O’Shea CC, Park PJ, Ren B et al. (2017). The 4D nucleome project. Nature 549, 219–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Delaleau M, and Borden KL (2015). Multiple export mechanisms for mRNAs. Cells 4, 452–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Dillinger S., Straub T., and Nemeth A. (2017). Nucleolus association of chromosomal domains is largely maintained in cellular senescence despite massive nuclear reorganisation. PLoS One 12, e0178821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Dobin A, and Gingeras TR (2015). Mapping RNA-seq reads with STAR In Curr Prot in Bioinform (John Wiley & Sons, Inc.), pp. 11.14.11–11.14.19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, and Huber W (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440. [DOI] [PubMed] [Google Scholar]
  27. Emanuelsson O, Brunak S, von Heijne G, and Nielsen H (2007). Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2, 953–971. [DOI] [PubMed] [Google Scholar]
  28. Ewels P, Magnusson M, Lundin S, and Kaller M (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Fasken MB, and Corbett AH (2009). Mechanisms of nuclear mRNA quality control. RNA Biol 6, 237–241. [DOI] [PubMed] [Google Scholar]
  30. Femino AM, Fay FS, Fogarty K, and Singer RH (1998). Visualization of single RNA transcripts in situ. Science 280, 585–590. [DOI] [PubMed] [Google Scholar]
  31. Flynn RA, Zhang QC, Spitale RC, Lee B, Mumbach MR, and Chang HY (2016). Transcriptome-wide interrogation of RNA secondary structure in living cells with icSHAPE. Nat Protoc 11, 273–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Fox CH, Johnson FB, Whiting J, and Roller PP (1985). Formaldehyde fixation. J Histochem Cytochem 33, 845–853. [DOI] [PubMed] [Google Scholar]
  33. Friedman JR, Lackner LL, West M, DiBenedetto JR, Nunnari J, and Voeltz GK (2011). ER tubules mark sites of mitochondrial division. Science 334, 358–362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Gagnon KT, Li L, Janowski BA, and Corey DR (2014). Analysis of nuclear RNA interference in human cells by subcellular fractionation and Argonaute loading. Nat Protoc 9, 2045–2060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J et al. (2004). Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Giacomello M, and Pellegrini L (2016). The coming of age of the mitochondria-ER contact: a matter of thickness. Cell Death Diff 23, 1417–1427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Gold VA, Chroscicki P, Bragoszewski P, and Chacinska A (2017). Visualization of cytosolic ribosomes on the surface of mitochondria by electron cryo-tomography. EMBO Rep 18, 1786–1800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Grunwald D, and Singer RH (2010). In vivo imaging of labelled endogenous beta-actin mRNA during nucleocytoplasmic transport. Nature 467, 604–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Gu Z, Gu L, Eils R, Schlesner M, and Brors B (2014). circlize Implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812. [DOI] [PubMed] [Google Scholar]
  40. Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, Talhout W, Eussen BH, de Klein A, Wessels L, de Laat W, et al. (2008). Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951. [DOI] [PubMed] [Google Scholar]
  41. Han S, Li J, and Ting AY (2018). Proximity labeling: spatially resolved proteomic mapping for neurobiology. Curr Opin Neurobiol 50, 17–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Han S, Udeshi ND, Deerinck TJ, Svinkina T, Ellisman MH, Carr SA, and Ting AY (2017). Proximity biotinylation as a method for mapping proteins associated with mtDNA in living cells. Cell Chem Biol 24, 404–414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Hansen MMK, Desai RV, Simpson ML, and Weinberger LS (2018). Cytoplasmic Amplification of Transcriptional Noise Generates Substantial Cell-to-Cell Variability. Cell Systems 7, 384–397 e386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Hensen F, Moretton A, van Esveld S, Farge G, and Spelbrink JN (2018). The mitochondrial outer-membrane location of the EXD2 exonuclease contradicts its direct role in nuclear DNA repair. Sci Rep 8, 5368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Hung V, Lam SS, Udeshi ND, Svinkina T, Guzman G, Mootha VK, Carr SA, and Ting AY (2017). Proteomic mapping of cytosol-facing outer mitochondrial and ER membranes in living human cells by proximity biotinylation. ELife 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Hung V, Udeshi ND, Lam SS, Loh KH, Cox KJ, Pedram K, Carr SA, and Ting AY (2016). Spatially resolved proteomic mapping in living cells with the engineered peroxidase APEX2. Nat Protoc 11, 456–475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Hung V, Zou P, Rhee H-W, Udeshi Namrata D., Cracan V, Svinkina T, Carr Steven A., Mootha Vamsi K., and Ting Alice Y. (2014). Proteomic mapping of the human mitochondrial intermembrane space in live cells via ratiometric APEX tagging. Mol Cell 55, 332–341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Hwang J, and Espenshade PJ (2016). Proximity-dependent biotin labelling in yeast using the engineered ascorbate peroxidase APEX2. Biochem J 473, 2463–2469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Ichiyanagi K (2013). Epigenetic regulation of transcription and possible functions of mammalian short interspersed elements, SINEs. Genes Genet Syst 88, 19–29. [DOI] [PubMed] [Google Scholar]
  50. Ingolia NT, Ghaemmaghami S, Newman JR, and Weissman JS (2009). Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Jan CH, Williams CC, and Weissman JS (2014). Principles of ER cotranslational translocation revealed by proximity-specific ribosome profiling. Science 346, 1257521–1257521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Kaewsapsak P, Shechner DM, Mallard W, Rinn JL, and Ting AY (2017). Live-cell mapping of organelle-associated RNAs via proximity biotinylation combined with protein-RNA crosslinking. Elife 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Kall L, Krogh A, and Sonnhammer EL (2004). A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338, 1027–1036. [DOI] [PubMed] [Google Scholar]
  54. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, and Kent WJ (2004). The UCSC Table Browser data retrieval tool. Nuc Acids Res 32, D493–496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Kellems RE, Allison VF, and Butow RA (1974). Cytoplasmic type 80 S ribosomes associated with yeast mitochondria. II. Evidence for the association of cytoplasmic ribosomes with the outer mitochondrial membrane in situ. J Biol Chem 249, 3297–3303. [PubMed] [Google Scholar]
  56. Kellems RE, Allison VF, and Butow RA (1975). Cytoplasmic type 80S ribosomes associated with yeast mitochondria. IV. Attachment of ribosomes to the outer membrane of isolated mitochondria. J Cell Biol 65, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Kent WJ, S. ZA, G. B., S. HA, and D. K. (2010). BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, and Haussler D (2002). The human genome browser at UCSC. Genome Res 12, 996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Kim SJ, Fernandez-Martinez J, Nudelman I, Shi Y, Zhang W, Raveh B, Herricks T, Slaughter BD, Hogan JA, Upla P et al. (2018). Integrative structure and functional anatomy of a nuclear pore complex. Nature 555, 475–482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Krogh A, Larsson B, von Heijne G, and Sonnhammer EL (2001). Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305, 567–580. [DOI] [PubMed] [Google Scholar]
  61. Lam SS, Martell JD, Kamer KJ, Deerinck TJ, Ellisman MH, Mootha VK, and Ting AY (2014). Directed evolution of APEX2 for electron microscopy and proximity labeling. Nat Methods 12, 51–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Lee B, Flynn RA, Kadina A, Guo JK, Kool ET, and Chang HY (2017). Comparison of SHAPE reagents for mapping RNA structures inside living cells. RNA 23, 169–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Lex A, Gehlenborg N, Strobelt H, Vuillemot R, and Pfister H (2014). UpSet: visualization of intersecting sets. IEEE Trans Vis Comp Graph 20, 1983–1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and Genome Project Data Processing, S. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Linden A (2006). Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. J Eval Clin Pract 12, 132–139. [DOI] [PubMed] [Google Scholar]
  66. Loh KH, Stawski PS, Draycott AS, Udeshi ND, Lehrman EK, Wilton DK, Svinkina T, Deerinck TJ, Ellisman MH, Stevens B et al. (2016). Proteomic analysis of unbounded cellular compartments: synaptic clefts. Cell 166, 1295–1307. e1221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Love MI, Huber W, and Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Lubelsky Y, and Ulitsky I (2018). Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature 555, 107–111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Ma J, Liu Z, Michelotti N, Pitchiaya S, Veerapaneni R, Androsavich JR, Walter NG, and Yang W (2013). High-resolution three-dimensional mapping of mRNA export through the nuclear pore. Nat Comm 4, 2414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Maglott D, Ostell J, Pruitt KD, and Tatusova T (2011). Entrez gene: gene-centered information at NCBI. Nuc Acids Res 39, D52–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Mayr C (2017). Regulation by 3'-Untranslated Regions. Annu Rev Genet 51, 171–194. [DOI] [PubMed] [Google Scholar]
  72. Mercer Tim R., Neph S, Dinger Marcel E., Crawford J, Smith Martin A., Shearwood A.-Marie J., Haugen E, Bracken Cameron P., Rackham O, Stamatoyannopoulos John A., et al. (2011). The human mitochondrial transcriptome. Cell 146, 645–658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, and Jaffrey SR. (2012). Comprehensive analysis of mRNA methylation reveals enrichment in 3’ UTRs and near stop codons. Cell 149, 1635–1646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Mi H, Muruganujan A, Casagrande JT, and Thomas PD (2013). Large-scale gene function analysis with the PANTHER classification system. Nat Protoc 8, 1551–1566. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Moffitt JR, and Zhuang X (2016). RNA Imaging with Multiplexed Error-Robust Fluorescence In Situ Hybridization (MERFISH). Meth Enzy 572, 1–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Mortensen A, and Skibsted LH (1997). Importance of carotenoid structure in radical-scavenging reactions. J Agr Food Chem 45, 2970–2977. [Google Scholar]
  77. Mou Z, Tapper AR, and Gardner PD (2009). The armadillo repeat-containing protein, ARMCX3, physically and functionally interacts with the developmental regulatory factor Sox10. J Biol Chem 284, 13629–13640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Nemeth A, Conesa A, Santoyo-Lopez J, Medina I, Montaner D, Peterfia B, Solovei I, Cremer T, Dopazo J, and Langst G (2010). Initial genomics of the human nucleolus. PLoS Genet 6, e1000889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Padeken J, Zeller P, and Gasser SM (2015). Repeat DNA in genome organization and stability. Curr Opin Genet Dev 31, 12–19. [DOI] [PubMed] [Google Scholar]
  80. Pandey RR, Homolka D, Chen KM, Sachidanandam R, Fauvarque MO, and Pillai RS (2017). Recruitment of Armitage and Yb to a transcript triggers its phased processing into primary piRNAs in Drosophila ovaries. PLoS Genet 13, e1006956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Patel A, Lee HO, Jawerth L, Maharana S, Jahnel M, Hein MY, Stoynov S, Mahamid J, Saha S, Franzmann TM, et al. (2015). A liquid-to-solid phase transition of the ALS Protein FUS accelerated by disease mutation. Cell 162, 1066–1077. [DOI] [PubMed] [Google Scholar]
  82. Pedregosa F, Ga, #235, Varoquaux l., Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, et al. (2011). Scikit-learn: Machine Learning in Python. J Mach Learn Res 12, 2825–2830. [Google Scholar]
  83. Penny GD, Kay GF, Sheardown SA, Rastan S, and Brockdorff N (1996). Requirement for Xist in X chromosome inactivation. Nature 379, 131–137. [DOI] [PubMed] [Google Scholar]
  84. Petersen TN, Brunak S, von Heijne G, and Nielsen H (2011). SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8, 785–786. [DOI] [PubMed] [Google Scholar]
  85. Pimentel H, Bray NL, Puente S, Melsted P, and Pachter L (2017). Differential analysis of RNA-seq incorporating quantification uncertainty. Nat Methods 14, 687–690. [DOI] [PubMed] [Google Scholar]
  86. Pruitt KD, Tatusova T, and Maglott DR (2007). NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nuc Acids Res 35, D61–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Reck-Peterson SL, Redwine WB, Vale RD, and Carter AP (2018). The cytoplasmic dynein transport machinery and its many cargoes. Nat Rev Mol Cell Biol 19, 382–398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Reid DW, and Nicchitta CV (2012). Primary role for endoplasmic reticulum-bound ribosomes in cellular translation identified by ribosome profiling. J Biol Chem 287, 5518–5527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Reid DW, and Nicchitta CV (2015). Diversity and selectivity in mRNA translation on the endoplasmic reticulum. Nat Rev Mol Cell Biol 16, 221–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Reinke AW, Mak R, Troemel ER, and Bennett EJ (2017). In vivo mapping of tissue- and subcellular-specific proteomes in Caenorhabditis elegans. Sci Adv 3, e1602426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Rhee HW, Zou P, Udeshi ND, Martell JD, Mootha VK, Carr SA, and Ting AY (2013). Proteomic mapping of mitochondria in living cells via spatially restricted enzymatic tagging. Science 339, 1328–1331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Roundtree IA, Luo GZ, Zhang Z, Wang X, Zhou T, Cui Y, Sha J, Huang X, Guerrero L, Xie P et al. (2017). YTHDC1 mediates nuclear export of N(6)-methyladenosine methylated mRNAs. Elife 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Sadowski PG, Groen AJ, Dupree P, and Lilley KS (2008). Sub-cellular localization of membrane proteins. Proteomics 8, 3991–4011. [DOI] [PubMed] [Google Scholar]
  95. Schneider CA, Rasband WS, and Eliceiri KW (2012). NIH Image to ImageJ: 25 years of image analysis. Nat Methods 9, 671–675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Schnell U, Dijk F, Sjollema KA, and Giepmans BNG (2012). Immunolabeling artifacts and the need for live-cell imaging. Nat Methods 9, 152–158. [DOI] [PubMed] [Google Scholar]
  97. Shah S, Lubeck E, Zhou W, and Cai L (2016). In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron 92, 342–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Shen J, Zhang JH, Xiao H, Wu JM, He KM, Lv ZZ, Li ZJ, Xu M, and Zhang YY (2018). Mitochondria are transported along microtubules in membrane nanotubes to rescue distressed cardiomyocytes from apoptosis. Cell Death Dis 9, 81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Shen S, Park JW, Lu ZX, Lin L, Henry MD, Wu YN, Zhou Q, and Xing Y (2014). rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci USA 111, E5593–5601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Shukla CJ, McCorkindale AL, Gerhardinger C, Korthauer KD, Cabili MN, Shechner DM, Irizarry RA, Maass PG, and Rinn JL (2018). High-throughput identification of RNA nuclear enrichment sequences. EMBO J 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Spitale RC, Flynn RA, Zhang QC, Crisalli P, Lee B, Jung J-W, Kuchelmeister HY, Batista PJ, Torre EA, Kool ET, et al. (2015). Structural imprints in vivo decode RNA regulatory mechanisms. Nature 519, 486–490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Subtelny AO, Eichhorn SW, Chen GR, Sive H, and Bartel DP (2014). Poly(A)-tail profiling reveals an embryonic switch in translational control. Nature 508, 66–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Sultan M, Amstislavskiy V, Risch T, Schuette M, Dokel S, Ralser M, Balzereit D, Lehrach H, and Yaspo ML (2014). Influence of RNA extraction methods and library selection schemes on RNA-seq data. BMC Genomics 15, 675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Sun L, Fazal FM, Li P, Broughton JP, Lee B, Tang L, Huang W, Kool ET, Chang HY, and Zhang QC (2019). RNA structure maps across mammalian cellular compartments. Nat Struct Mol Biol 26, 322–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, and Narechania A (2003). PANTHER: a library of protein families and subfamilies indexed by function. Genome Res 13, 2129–2141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Thul PJ, Akesson L, Wiking M, Mahdessian D, Geladaki A, Ait Blal H, Alm T, Asplund A, Bjork L, Breckels LM et al. (2017). A subcellular map of the human proteome. Science 356. [DOI] [PubMed] [Google Scholar]
  107. Tibshirani R, Walther G, and Hastie T (2001). Estimating the number of clusters in a data set via the gap statistic. J Royal Stat Soc: Series B (Stat Meth) 63, 411–423. [Google Scholar]
  108. Valm AM, Cohen S, Legant WR, Melunis J, Hershberg U, Wait E, Cohen AR, Davidson MW, Betzig E, and Lippincott-Schwartz J (2017). Applying systems-level spectral imaging and analysis to reveal the organelle interactome. Nature 546, 162–167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. van der Maaten L, and Hinton G (2008). Visualizing data using t-SNE. J Machine Learn Res 9, 2579–2605. [Google Scholar]
  110. van Koningsbruggen S, Gierlinski M, Schofield P, Martin D, Barton GJ, Ariyurek Y, den Dunnen JT, and Lamond AI (2010). High-resolution whole-genome sequencing reveals that specific chromatin domains from most human chromosomes associate with nucleoli. Mol Biol Cell 21, 3735–3748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Van Nostrand EL, Pratt GA, Shishkin AA, Gelboin-Burkhart C, Fang MY, Sundararaman B, Blue SM, Nguyen TB, Surka C, Elkins K et al. (2016). Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods 13, 508–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. van Steensel B, and Belmont AS (2017). Lamina-associated domains: links with chromosome architecture, heterochromatin, and gene repression. Cell 169, 780–791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Walther TC, Fornerod M, Pickersgill H, Goldberg M, Allen TD, and Mattaj IW (2001). The nucleoporin Nup153 is required for nuclear pore basket formation, nuclear pore complex anchoring and import of a subset of nuclear proteins. EMBO J 20, 5703–5714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Weil TT, Parton RM, and Davis I (2010). Making the message clear: visualizing mRNA localization. Trends Cell Biol 20, 380–390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Wheeler TJ, Clements J, Eddy SR, Hubley R, Jones TA, Jurka J, Smit AF, and Finn RD (2013). Dfam: a database of repetitive DNA based on profile hidden Markov models. Nuc Acids Res 41, D70–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Wickham H (2009). ggplot2: elegant graphics for data analysis. Springer, 1-+. [Google Scholar]
  117. Wickramasinghe VO, and Laskey RA (2015). Control of mammalian gene expression by selective mRNA export. Nat Rev Mol Cell Biol 16, 431–442. [DOI] [PubMed] [Google Scholar]
  118. Williams CC, Jan CH, and Weissman JS (2014). Targeting and plasticity of mitochondrial proteins revealed by proximity-specific ribosome profiling. Science 346, 748–751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Wishart JF, and Rao BSM (2010). Recent trends in radiation chemistry (World Scientific; ) [Google Scholar]
  120. Zuker M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nuc Acids Res 31, 3406–3415. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Table S1. Related to Figure 1:

List of primers used.

6

Table S6. Related to Figure 7:

Analysis of transcripts enriched at OMM using nocodazole (NOC) time-course experiments.

7

Figure S1. Related to Figure 1:

(A) Streptavidin-biotin dot blot analysis of tRNA labeling by horseradish peroxidase (HRP) in vitro. Left: Yeast tRNA extract was incubated for 1 minute with HRP, biotin-phenol and H2O2, after which the tRNA was purified and the resulting product was treated with either proteinase K or RNase A. Right: The products were spotted, and biotinylated species detected via staining with streptavidin.

(B) LC-MS detection of deoxyguanosine (dG)-pentachlorophenol (PCP) adduct resulting from HRP labeling reaction in vitro. dG was incubated for 1 minute with PCP, HRP, and H2O2. The chemical composition of the resulting mixture was analyzed by LC-MS in negative ion detection mode. The left column shows the experiment and the right column shows the negative control in which H2O2 was omitted. Row 1 is the UV trace chromatogram. Row 2 is an enlarged UV trace chromatogram (asterisk denotes the dG-PCP product). Row 3 is the total ion count chromatogram. Rows 4-6 shows mass chromatograms corresponding to the mass of dG-PCP (product), PCP (starting material), and dG (starting material), respectively.

(C) APEX2 catalyzes formation of G-PCP adduct. UV trace chromatograms of guanosine (G) or adenosine (A) reacting with PCP and APEX2 in vitro. G (top row) or A (bottom row) was incubated for 1 minute with PCP, APEX2, and H2O2. The resulting mixture was analyzed by HPLC with UV detection of chemical species. Absorption peaks corresponding to G, PCP and G-PCP adduct are labeled in row 1. Rows 2-5 show negative controls with H2O2, PCP, G or APEX2 omitted, respectively. Row 6 shows the same experiment with A in place of G.

(D) In vitro transcribed 5S ribosomal RNA was treated with HRP in duplicate in the presence of biotin-phenol followed by 1-minute treatment with H2O2 and enriched by streptavidin-biotin pulldown. The enriched RNA was reverse transcribed into cDNA, and the resulting products were run on a denaturing PAGE gel. Modification of 5S RNA at GGG sequences results in excess truncated DNA products (black arrows) relative to controls (no H2O2 added) carried out in duplicate. Based on streptavidin pulldown and quantification of in vitro labeled RNA, we estimate a labeling efficiency of 0.11 ± 0.02 (N = 2).

(E) For 5S RNA, 3 of the 4 GGG sequences interrogated yielded gel bands, presumably due to the RT-enzyme halting at the corresponding biotinylated nucleotides.

(F) Optimization of the washing step following binding of biotinylated RNA to the streptavidin beads. We used buffers containing high salt (1M NaCl) as well as buffers previously used for APEX Proteomics washes with or without urea or formamide. We varied both the amount and type of streptavidin beads used, and the duration and temperature at which incubations of labeled RNA with the beads were carried out. A simple high salt wash provided high enrichment of mitochondrial RNAs (blue bar graph), as determined by enrichment by RT-qPCR of two mitochondrial RNAs - MTND1 and MTCO1 - relative to 4 non-mitochondrial RNAs - XIST, FAU, GAPDH and SSR2. The high-salt wash also maintained higher and more reproducible recovery relative to other conditions. Sources of error arise from both biological replicates and technical replicates.

(G) Bioanalyzer traces of RNA extracted from APEX2-NES HEK cells confirming that the RNA is not degraded (RNA integrity number (R.I.N.) = 9-10) upon treatment with biotin phenol and/or hydrogen peroxide, based on the ribosomal RNAs 18S and 28S remaining intact.

8

Figure S2. Related to Figure 2:

(A) APEX2 fusion constructs employed in this study. FLAG-APEX2-NES uses a nuclear export signal (NES) to localize APEX2 throughout the cytoplasm. Mito-V5-APEX2 employs a 24-amino-acid mitochondrial targeting sequence (MTS) from COX4 to localize APEX2 throughout the mitochondrial matrix. FLAG-OMM-APEX2 employs the C-terminal 31 amino acids of mitochondrial antiviral-signaling protein (MAVS) to target APEX2 to the outer mitochondrial membrane (OMM). ERM-APEX2-V5 employs the transmembrane segment of the endoplasmic reticulum (ER)-resident protein P450 oxidase 2C1 to target APEX2 to the ER membrane (ERM). HRP-V5-KDEL employs a KDEL sequence to target the horseradish peroxidase (HRP) to the ER lumen. V5-APEX2-NLS employs a nuclear localization sequence (NLS) to target APEX2 throughout the entire nucleus. GFP-APEX2-NIK3x employs three tandem nucleolar targeting sequences from NF-ΚB-inducing kinase (NIK) to localize APEX2 to the nucleolus. V5-APEX2-LMNA is targeted to the nuclear lamina by fusing APEX2 to the N terminus of prelamin-A/C (LMNA). V5-APEX2-SENP2 is targeted to the nuclear pore complex by fusing APEX2 to the N terminus of Sentrin-specific protease 2 (SENP2). V5 and FLAG are epitope tags.

(B) Co-localization of APEX expression with neutravidin staining. Left: Fluorescence imaging of APEX2 localization and biotinylation activity. Live-cell biotinylation was performed for 1 minute with BP and H2O2 in HEK cells stably expressing the indicated APEX2 fusion protein. APEX2 expression was visualized by anti-V5/FLAG staining (green). Biotinylated species were visualized by staining with neutravidin-AlexaFluor 647 (red). Scale bars, 10 µm. Right: pixel intensity plot of the dashed line shown in images on the left.

(C) Mapping statistics of APEX-seq libraries generated. Figures show the percentage of mapped reads, as well as the total number of reads. Most polyA-selected libraries showed high proportion (> 80%) of uniquely-mapped reads.

(D) Correlation plot of biological replicates, showing that the unlabeled controls for the different constructs are quite similar to each other, and to the nuclear pore and ER lumen target constructs. The MITO APEX-seq libraries are most different from the other libraries.

(E) RT-qPCR analysis showing specific enrichment of on-target (blue) over off-target (grey) RNAs with APEX-seq, but not APEX-RIP, in different “open” compartments. HEK cells stably expressing the corresponding APEX2 were labeled for 1 minute with BP and H2O2. For APEX-seq, biotinylated RNAs were enriched with streptavidin beads following total RNA extraction and then analyzed by RT-qPCR. For APEX-RIP, RNAs were crosslinked to proteins for 10 minutes before streptavidin beads enrichment. Data are the mean of 3 replicates-± 1 S.D.. The data was normalized such that the mean enrichment of off-target RNAs was 1 for both APEX-seq and APEX-RIP.

(F) The proportion of all reads in the MITO APEX-seq libraries mapping to the 15 mRNAs and rRNAs. Over 10% of all reads map to MTCO1.

(G) Browser tracks of mitochondrial (MT) genome showing robust enrichment of mitochondrial RNAs in the mitochondrial-matrix (MITO) APEX-seq library, but not from the libraries generated from constructs targeting other subcellular locations.

(H) Scatter plot of enrichment for the 15 MT rRNAs and mRNAs between polyA-selected RNA and total RNA, showing good agreement between the two.

(I) ROC curve showing the performance of APEX-seq for different analysis protocols. These include no ratiometric normalization (blue), as well as 2-controls (red), 4-controls (yellow) and 18-controls conditions (green). For challenging open locations, combining controls from other APEX-seq constructs improves performance. For the entire paper, unless otherwise mentioned, 18-controls data is shown. For comparison the ER polysome profiling RNA-seq is shown (purple). Here the true positive is the Jan et al. list(Jan et al., 2014), and false positive is predicted non-secretory proteins(Kaewsapsak et al., 2017).

(J) ERM APEX-seq shows clear separation of true positives (determined by proximity-based ribosome profiling), and negatives (predicted to be non-secretory based on Phobius, SignalP or TMHMM).

(K) Using a list of 71 true-positive ERM transcripts, the coverage of APEX-seq was compared to ribosome profiling and ER fractionation-seq. All three methods yield similar coverage.

(L) Comparing ERM APEX-seq to ERM proteomics, APEX-seq shows higher coverage for these 71 genes.

(M) The proportion of transcripts retained as enriched, as the fold-change cutoff based on APEX-seq comparison of labeled targets versus unlabeled controls is varied. Unless otherwise mentioned, a log2fold-change of 0.75 was used in Figures 25.

9

Figure S3. Related to Figure 3:

(A) Bar plots showing the protein localization of the transcripts enriched by APEX-seq. These numbers are based on the ~3250 genes examined in Figure 3 that have reliable protein localization data in the Protein Cell Atlas database.

(B) Cellular component GO-terms associated with the clusters determined from heatmap in Figure 3D, confirming that the nuclear locations are enriched for nuclear-associated GO terms, and the ER and OMM for membrane GO terms. Size of bubble denotes more significant enrichment/depletion.

(C) Scatter plot comparing the OMM and the ERM APEX-seq log2fold-changes in HEK cells. Genes are categorized as in Figure 6B. Gene names are shown for proteins known to be dual-localized to ER and mitochondria.

(D) Of the mRNAs enriched by both OMM and ERM APEX-seq, more than 90% have secretory annotations.

(E) Histogram showing the length distribution of transcripts recovered from APEX-seq versus fractionation-seq. Both methods yield comparable distributions for the length of transcripts recovered.

(F) Violin plot showing the striking difference in fold changes of ER transcripts between fractionation-seq and APEX-seq. P-value is from a Mann-Whitney U test.

(G) Cumulative distribution of transcripts in the nuclear/cytosolic fractionation-seq data split into two groups based on cytosolic APEX-seq data. Genes enriched by cytosolic APEX-seq (log2fold-change > 0, pFDR-adjusted < 0.05) had much higher enrichment (p <10−100, KS test), in the cytosolic fractionation data relative to genes depleted by cytosolic APEX-seq (log2fold-change > 0, pFDR-adjusted < 0.05). P-values from a KS test.

(H) Mitochondrial APEX-seq shows robust enrichment of the MT rRNAs and mRNAs, and no enrichment of OMM-enriched RNAs. There are ~400 transcripts that have large positive fold changes (log2foldchange > 0.75, pFDR-adjusted < 0.05).

10

Figure S4. Related to Figure 4:

(A) (B) (C) Number of intron-retention events across APEX-seq enriched transcripts in the nucleus and nuclear pore, as well as fractionation-seq. An intron-retention score was calculated based on how much the retained-intron transcript was obtained relative to the corresponding cytosol control, and was computed by taking the sum of the absolute values of the log2fold-change enrichments for the most cytosol-biased and most nuclear-biased transcript.

(D) Scatter plot showing high correlation between intron skipping or retention in nucleus APEX-seq (relative to cytosol APEX-seq) versus nuclear fractionation-seq (relative to cytosol). Genes displaying no differential expression between the nucleus and cytosol, but with at least one transcript enriched in the nucleus and a different transcript enriched in the cytosol, were called as displaying isoform switching.

(E) The genes shown in Figure 4E were identified by selecting for transcripts that are highly abundant and showed high-isoform switching scores.

(F) (G) Barplots showing the number of genes showing alternative splice sites at (F) 5’ UTRs and 3’ UTRs in the APEX-seq samples, relative to unlabeled controls (FDR < 0.05).

(H) (I) Cumulative distributions of the exon length, and number of isoforms for genes enriched by APEX-seq in the nuclear pore relative to other locations. We observe shorter transcripts at the nuclear pore relative to other locations. We see no significant difference in distribution across the locations. Here the transcript length was calculated by considering the most-abundance transcript isoform for each gene across all locations in the APEX-seq data.

(J) (K) (L) Cumulative distribution of the introns length, exon lengths and number of introns for genes enriched by APEX-seq in the nuclear locations. Here the transcript length was calculated by considering the longest-stable transcript isoform for each gene.

11

Figure S5. Related to Figure 5:

(A) (B) (C) (D) Using all the APEX-seq enriched genes as a background, the estimated FDR of finding the nuclear-lamina repeat motifs.

(E) Heatmap showing the number of repeat motifs in exons of transcripts enriched by APEX-seq. Unlike in Figure 5A, this analysis considers all enriched genes, not just enriched genes unique to that location. We continue to see strong enrichment of these motifs in the nuclear lamina, but also in other nuclear locations relative to the cytosolic locations.

(F) Scatter plot showing good correlation between post-enrichment APEX-seq control data and published polyA-selected RNA-seq data from HEK. APEX-seq control data was averaged from 18 controls generated from APEX2 constructs targeting 9 locations.

(G) RNA-seq abundance of all genes (not just unique genes) enriched by APEX-seq. FPKM = fragments per kilobase per million reads. P-values from Mann-Whitney U test.

(H) (I) Using post-enrichment APEX-seq control data we also obtain decreased abundance of nuclear-lamina enriched genes, both for all genes and more strikingly for unique genes. FPKM = fragments per kilobase per million reads. P-values from Mann-Whitney U test.

(J) (K) (L) Bar plots showing the proportion of genes found in lamina-associated domains or nucleolus associated domains or both. P-values are from Fisher’s exact tests.

(M) Control test examining the localization of mitochondrial genes, confirming no similar enrichment of genes in the nucleolus or nuclear-lamina transcriptomes. P-values are from Fisher’s exact tests.

12

Figure S6. Related to Figure 6:

(A) Cluster map of the OMM perturbation experiments, along with the corresponding cytosolic background. All controls cluster together, while the cytosolic locations vary less across the different perturbation experiments relative to their OMM counterparts. OMM cycloheximide is the most different among these labeled libraries.

(B) Molecular function GO-terms based from clusters in Figure 6M. The clusters enriched in mitochondrial genes (1+4+6) show differences relative to clusters 2+3.

(C) Browser tracks of a mitochondrial gene (MUT) show increased enrichment by OMM-APEX upon CHX treatment.

(D) Browser tracks of an OXPHOS gene (NDUFB6) that show increased enrichment by OMM-APEX upon PUR/CCCP treatment.

(E) Scatter plots of OMM APEX-seq log2 fold change comparing the basal and PUR (y axis, left)/CCCP (y axis, right) conditions.

(F) Gene density distribution of OMM APEX-seq log2fold-change under Puro or CCCP condition. Genes are functionally classified according to Gene Ontology.

(G) Plot showing the overlapping number of enriched genes in the different OMM perturbation experiments. All genes that were enriched in at least 1 of the four conditions are included.

(H) The proportion of known mitochondrial genes in the different clusters. Clusters 1+4+6 are highly enriched in mitochondrial genes, while clusters 5+7 are significantly depleted. P-values from Fischer’s exact text.

(I) Examining the transcripts not annotated as mitochondrial in clusters 1+4+6 yields 12 transcripts, of which 7 are pseudogenes and 5 are mRNAs. Of these 5 mRNAs, further literature examination shows evidence for 1 coding for a protein localizing to the mitochondria (TTLL4(Thul et al., 2017)) and 2 localizing to the OMM (ARMCX3(Mou et al., 2009) and EXD2(Hensen et al., 2018)).

(J) Genome tracks of EXD2 from (I).

13

Figure S7. Related to Figure 7:

(A) Using the classifier in Figure 7D, we asked which regions of the transcripts were most important for classifier performance. The 5’ UTR was less important than the 3’ UTR or coding sequence for performance.

(B) Same as (A), but calculating importance along the entire transcript (i.e. normalizing for length) reveals that 3’ UTR is most important for protein- vs RNA-dependent classification.

(C) The 6-mers most important for the classifier performance. Note the polyadenylation signal sequence AAUAAA is one of the most important sequences for prediction.

(D) 3’ UTR length across ERM-dependent, protein-dependent and RNA-dependent transcripts.

(E) OMM-APEX2 remains on mitochondrial in the presence of CCCP or nOc drug perturbation. HEK cells stably expressing OMM-APEX2 were treated with CCCP or NOC for 30 minutes as in the APEX-seq experiments. APEX2 expression was visualized by anti-FLAG staining (green). An antibody against endogenous TOM20 was used as markers for the mitochondria. Scale bars, 10 µm.

(F) Mapping statistics of APEX-seq libraries generated under NOC, PUR, CHX and CCCP perturbation conditions.

(G) Pearson correlation of APEX-seq OMM and cytosol perturbation libraries. The target replicates agree well.

(H) Correlation of fold change upon 30-minute nocodazole treatment (where effect saturates) and the corresponding change upon CHX treatment. Changes are measured relative to basal conditions. These are the 1329 transcripts enriched in any of the time points (0, 3, 6, 9 or 30 minutes).

(I) Molecular function GO-term of clusters. Here the background is the set of genes enriched in any of the time points.

2

Table S2. Related to Figure 2:

Mapping statistics of all APEX-seq and polyA+ fractionation-seq libraries.

3

Table S3. Related to Figure 2:

List of all localized transcripts and all orphans.

4

Table S4. Related to Figure 6:

Analysis of transcripts enriched at OMM with puromycin (PUR), cycloheximide (CHX) and CCCP treatments.

5

Table S5. Related to Figure 7:

List of all ribosome-dependent and RNA-dependent transcripts used to build the random-forest algorithm.

RESOURCES