Abstract
Intercellular communication mediated by extracellular vesicles has proved to play an important role in normal and pathological scenarios. However not too much information about the sorting mechanisms involved in loading the vesicles is available. Recently, our group has characterized the mRNA content of vesicles released by hepatic cellular systems, showing that a set of transcripts was particularly enriched in the vesicles in comparison with their intracellular abundance. In the current work, based on in silico bioinformatics tools, we have mapped a novel sequence of 12 nucleotides C[TA]G[GC][AGT]G[CT]C[AT]GG[GA], which is significantly enriched in the set of mRNAs that accumulate in extracellular vesicles. By including a 3′-UTR containing this sequence in a luciferase mRNA reporter, we have shown that in a hepatic cellular system this reporter mRNA was incorporated into extracellular vesicles. This study identifies a sorting signal in mRNAs that is involved in their enrichment in EVs, within a hepatic non-tumoral cellular model.
Keywords: sorting signal, mRNA, extracellular vesicles
Introduction
In recent years, intercellular transference of active macromolecules, mediated by cell-released-extracellular vesicles (EVs), has been recognized as a key regulatory mechanism in many biological processes.1-3 An intensive analysis of EVs content has shown that these vesicles contain a large variety of molecules such as lipids, native and post-translational modified proteins and nucleic acids, including coding and non-coding RNA.7,8 Although various mechanisms of EVs formation and protein loading have been revealed, very little is known about how the genetic material is targeted into EVs. Inside the cells, cis-acting regulatory sequences and trans-acting proteins are considered as the main driving forces for the mRNA intracellular localization. Such sequences, also known as zipcodes, are typically found in the 3′-unstranslated regions (3′-UTRs) of mRNA transcripts. Several studies have shown that many mRNAs and miRNAs are significantly enriched in EVs supporting the existence of controlled mechanisms for packaging RNAs into EVs. Recently, two studies in tumoral cellular models have thrown light on cis-acting elements that are responsible for the RNA transport into EVs, secreted by glioblastoma11 and melanoma cells.12 Batagov and coworkers12 applied an in silico approach and found a significant enrichment of three motifs (ACCAGCCU, CAGUGGAGC and UAAUCCCA) in 3′-UTRs of EVs-enriched RNAs. Bolukbasi et al.11 described a stem loop-forming 25-nts sequence capable of increasing the amount of reporter mRNAs in glioblastoma and melanoma EVs approximately two times. Interestingly, the activity of this cis-acting element seems to be controlled by the CTGCC core sequence and miRNA-binding site for miR-1289.
Methods for searching localization signals of proteins or RNAs can be undertaken by experimental and/or computational approaches. Wet-lab experiments allow for an observation of real system behavior in the response to an external stimuli or an alteration of the system. Unfortunately, biological approaches are often difficult, expensive and time consuming. Computational approaches usually do not suffer from such drawbacks and they proved to be useful for the analysis of a large amount of data. However, motif discovery using computational methods also turned out to be difficult. To our knowledge, there are over a hundred of different algorithms dedicated to motif finding13-18 but none of them is able to warranty a successful motif identification. Recent review of the 13 most popular motif finding algorithms has showed that sensitivity and the predictive value were estimated under 15% for most of them.19 Thus, combining in silico and in vitro approaches seems to be the best way of efficient motif identification.
In this work, we have combined the two approaches to investigate a hepatic cellular model and check whether any sequence within mRNA may act like the zipcode that targets it into EVs. In our analysis, we used the transcriptomics data obtained from the study of the liver-derived cell line MLP29.20 During the in silico phase, first the data was analyzed in order to obtain detailed information about the sequences, and different data sets were prepared. Afterwards, the mRNA sorting motifs, previously reported by other groups were analyzed.11,12 We have found that they do not play a significant role in the targeting of RNA into hepatic EVs. Next, we searched for novel motifs. Using MEME16 we were able to reveal 12 sequences, 7 to 15-nts long, which were significantly more abundant in EVs-enriched RNAs of hepatic origin. Then, we performed a secondary structure prediction along with a search for miRNA binding sites in these sequences. As a result we have identified a 12-nt sequence C[TA]G[GC][AGT]G[CT]C[AT]GG[GA] included in a stem loop-forming region that is observed in 39,6% of the 3′-UTRs of the EVs-enriched mRNAs. Incorporation of a 3′-UTR containing this motif into the mRNA of the luciferase gene resulted in a significant increment of targeting into EVs of this reporter mRNA.
Results
Preliminary analysis of data and data sets preparation
In order to find sequence motifs that were significantly enriched in RNAs contained in hepatic EVs and, at the same time, low represented in the RNAs poorly found in EVs, two data sets were generated from the data reported by Royo et al.20: data set a (RNAs enriched in EVs) and data set b (RNAs underrepresented in EVs). First, we searched the NCBI database22 in order to obtain detailed information about the sequences. Data set a contained 92 exosomal records, where 12 sequences (i.e., 13,04% of data set) were characterized as full insert sequences, 80 (86,96% of data set) - as mRNAs. In contrast to sequences from data set a only 2 (2,27% of data set) sequences from data set b were identified as full insert sequences, while the 86 remaining sequences were described as mRNAs. The length of the EVs sequences ranged from 294 nt to 6943 nt, while the length of sequences underrepresented in EVs ranged from 83 nt to 6453 nt (there were not miRNA sequences among the data).
Afterwards, data sets a and b were divided into different subsets in order to perform a deeper analysis of the data. First, we decided to examine UTR sequences because the targeting motifs were mostly found in the UTRs. However, not for all analyzed sequences the UTRs are known. Therefore, we analyzed both full sequences and 3′-UTRs and 5′-UTRs (each separately). Second, we took into consideration the presence of repetitive elements. It could lead to high number of false positives, but on the other hand, they could serve as binding sites for trans-acting factors involved in sorting of RNA into EVs. Taking these facts into account, two subgroups of data were generated: one containing the repetitive elements, which was labeled “unmasked,” and another created by removing these elements, which was labeled “masked” (Table 1).
Table 1.Data sets that were obtained in data set preparation step and further used in the motif search phase.
| Data set id | Data set contents | Data set id | Data set contents |
|---|---|---|---|
| 1a | unmasked EVs-enriched RNA | 1b | Unmasked EVs-underrepresented RNA |
| 2a | masked EVs-enriched RNA | 2b | masked EVs-underrepresented RNA |
| 3a | unmaskedEVs-enriched 5′-UTR | 3b | unmasked EVs-underrepresented 5′-UTR |
| 4a | masked EVs-enriched 5′-UTR | 4b | masked EVs-underrepresented 5′-UTR |
| 5a | unmasked EVs-enriched 3′-UTR | 5b | unmasked EVs-underrepresented 3′-UTR |
| 6a | masked EVs-enriched 3′-UTR | 6b | masked EVs-underrepresented 3′-UTR |
Motif occurrence in mRNAs enriched in EVs of hepatic origin
We searched our data set [mRNA-contained in EVs released by MLP29]20 to find out whether any of the previously reported motifs11,12 were potentially involved in the recruitment of mRNAs to hepatic EVs. The results showed that there was no significant enrichment of any of these motifs in the data set (Table 2), indicating that the targeting mechanisms in hepatic EVs could require different motifs than in the scenarios used in the previous works.11,12
Table 2. The results for MLP29 transcriptomics data searched for the motifs described by Bolukbasi et al.11 and Batagov et al.12.
| motif | Number of RNA sequences containing a given motif / number of a given motif occurrences | |||
|---|---|---|---|---|
| 3′-UTRs enriched-EVs | 3′-UTRs underrepresented-EVs | 3′-UTRs MLP-EVs | 3′-UTRs MLP-cells | |
| ACCAGCCT | 2 / 2 | 1 / 1 | 243 / 261 | 321 / 341 |
| CAGTGAGC | 0 / 0 | 0 / 0 | 219 / 228 | 297 / 311 |
| TAATCCCA | 1 / 1 | 1 / 1 | 224 / 235 | 336 / 354 |
| CTGCC CTCCC CGCCC | 43 / 243 | 62 / 252 | 5323 / 28035 | 7562 / 39370 |
Motif search within mRNAs enriched in EVs of hepatic origin
In order to obtain sequences significantly enriched in RNAs, contained in EVs we have compared data sets 1a-6a vs their 1b-6b counterparts (Table 1).
Initially, we have used conventional multiple alignment algorithms described in Material and Methods section. Unfortunately none of them revealed any statistically significant motifs. In the case of unmasked sequence data sets 1a, 1b, 3a, 3b, 5a and 5b, only poly(A) tails and GC-rich regions were found. Although some consensus regions were found in the masked data sets, the results were still not acceptable because of the weak conservation of these regions. Even though this approach proved to be useful in the analysis performed by Bolukbasi et al.,11 similar negative result using this methodology was obtained in a study by Batagov et al.12 Thus, we have decided to use a local alignment algorithm implemented in BLAST,21 which resembled the successful ab initio approach described by Batagov et al.12 However, the results were still unsatisfactory. As the previous methods based on both global and local alignments were not satisfactory, we decided to use Multiple EM for Motif Elicitation (MEME) - a well-known algorithm designed for de novo motif discovery, which - according to Tompa et al.19 - proved to be one of the best for analysis of mouse data sets.16 As the next step, the motifs that were highly conserved were selected for the further analysis. Conservation in case of nucleotide occurrences considered as sites location within a sequence and number of repetitions together with a motif structure was investigated. As a result we obtained 12 candidate motifs (Table 3).
Table 3. Summary of the conserved motifs found by MEME, significantly enriched in MLP29 EVs mRNAs .

Prediction of secondary structures
To investigate if the secondary structures of the motif and their adjacent sequences were conserved, a computational analysis of RNA secondary structures was performed using mfold.28 We have found that the candidate motifs were often located within a stem-loop structure; Figure 1 shows representative example.

Figure 1. Predicted secondary structures of motif 3.
MicroRNAs analysis
As previously mentioned Bolukbasi and coworkers found that miRNA can be involved in the EVs targeting mechanism.11 Based upon this, we decided to detect if our selected candidate motifs could be target for microRNAs. We performed miRNA-binding site scan using miRanda.29 First, all known mouse miRNAs were scanned against 12 conserved candidate motifs. Then, we characterized experimentally the microRNAs that were released by the MLP29 cell line as describe in Materials and Method section. The complete list of miRNAs secreted by MLP29 is listed in Table S2. These miRNAs were used to perform miRNA-binding site scan using miRanda against the 12 conserved motifs. As result of this analysis we have detected a number of miRNAs that could potentially bind the motifs (Table 4).
Table 4. List of motifs and microRNAs that potentially bind to them.
| Motif id | mature miRNA |
|---|---|
| unmasked RNA enriched-EVs vs. unmasked RNA underrepresented-EVs | |
| Motif 8 | mmu-miR-7646–5p, #mmu-miR-96–3p, mmu-miR-376b-5p, #mmu-miR-741–3p, mmu-miR-3088–3p, mmu-miR-881–5p, mmu-miR-376c-5p, #mmu-miR-3090–5p, #mmu-miR-539–5p |
| Motif 9 | #mmu-miR-103–1-5p, mmu-miR-107–5p, mmu-miR-100–3p, mmu-miR-344h-5p, #mmu-miR-103–2-5p, #mmu-miR-675–3p, #mmu-miR-344e-5p |
| Motif 10 | #mmu-miR-3074–1-3p, mmu-miR-7214–5p, mmu-miR-6947–5p, mmu-miR-184–5p, #mmu-miR-145a-5p, mmu-miR-5124a, #mmu-miR-205–3p, mmu-miR-145b |
| Motif 14 | mmu-miR-7056–5p, mmu-miR-759, #mmu-miR-3082–5p, mmu-miR-6982–5p, mmu-miR-6969–5p, #mmu-miR-1940, #mmu-miR-1951 |
| Motif 15 | mmu-miR-290b-5p, mmu-miR-7213–5p, mmu-miR-1298–3p, mmu-miR-6970–5p, #mmu-miR-1199–5p, #mmu-miR-3075–5p, mmu-miR-7219–3p, #mmu-miR-3090–5p |
| Motif 18 | mmu-miR-6955–3p, mmu-miR-7235–3p, mmu-miR-29a-5p, #mmu-miR-1896, mmu-miR-5625–3p, mmu-miR-6908–3p |
| Motif 20 | mmu-miR-7687–3p, #mmu-miR-1896, mmu-miR-6960–5p, mmu-miR-7664–3p, mmu-miR-8104, mmu-miR-6924–5p, #mmu-miR-130b-5p, mmu-miR-3087–3p, mmu-miR-7014–3p, mmu-miR-7093–5p |
| Motif 23 | mmu-miR-466e-5p, #mmu-miR-669a-5p, #mmu-miR-669f-5p, mmu-miR-466p-5p, mmu-miR-466a-5p, #mmu-miR-669p-5p, #mmu-miR-669l-5p, #mmu-miR-1187 |
| Motif 30 | mmu-miR-344d-2–5p, mmu-miR-5135, mmu-miR-1898, #mmu-miR-199a-5p, #mmu-miR-199b-5p, mmu-miR-5124a, #mmu-miR-133b-5p, #mmu-miR-710, #mmu-miR-1934–5p |
| masked RNA enriched-EVs vs masked RNA underrepresented-EVs | |
| Motif 3 | #mmu-miR-465a-3p, #mmu-miR-465c-3p, #mmu-miR-3071–5p, #mmu-miR-3058–3p, mmu-miR-466q, mmu-miR-148b-5p, #mmu-miR-215–3p, mmu-miR-7681–5p, mmu-miR-184–5p, mmu-miR-7237–3p, #mmu-miR-3074–1-3p, mmu-miR-429–5p, #mmu-miR-465b-3p |
| masked 3′-UTRs enriched-EVs vs. 3′-UTRs masked underrepresented-EVs | |
| Motif 3 | #mmu-miR-1983, mmu-miR-6947–3p, mmu-miR-6541, #mmu-miR-378a-5p, #mmu-miR-345–5p, mmu-miR-7661–3p, #mmu-miR-3058–3p, #mmu-miR-1968–5p, mmu-miR-7242–3p, mmu-miR-7081–3p, mmu-miR-6715–5p, mmu-miR-7050–3p, #mmu-miR-667–3p, mmu-miR-6945–5p, mmu-miR-6939–5p, #mmu-miR-221–5p, #mmu-miR-351–5p, #mmu-miR-326–3p, mmu-miR-7035–3p, mmu-miR-7681–5p, #mmu-miR-3474, #mmu-miR-125b-5p, #mmu-miR-125a-5p, mmu-miR-509–5p, #mmu-miR-344d-1–5p, mmu-miR-6982–3p, mmu-miR-7016–3p, mmu-miR-5623–5p, #mmu-miR-330–5p, mmu-miR-6367, mmu-miR-8103, mmu-miR-5125, mmu-miR-6990–5p, #mmu-miR-667–5p, #mmu-miR-3064–5p, mmu-miR-7089–3p, #mmu-miR-3077–3p, #mmu-miR-199b-5p, mmu-miR-7230–5p, mmu-miR-7033–3p, #mmu-miR-3113–3p, #mmu-miR-370–3p, #mmu-miR-3113–5p, #mmu-miR-874–5p, #mmu-miR-3085–3p, #mmu-miR-412–3p, #mmu-miR-1906, #mmu-miR-199a-5p, mmu-miR-6394, #mmu-miR-149–5p, #mmu-miR-214–5p, #mmu-miR-344c-5p, #mmu-miR-3097–3p, mmu-miR-7032–5p, mmu-miR-6971–3p |
| unmasked 3′-UTRs enriched-EVs vs. 3′-UTRs unmasked underrepresented-EVs | |
| Motif 7 | mmu-miR-6379 |
|
In bold - miRNAs that potentially bind motifs located in nontranslated regions #microRNAs identified experimentally as secreted by MLP29 cells | |
Statistical validation
As most of the conserved motifs were not found within UTRs, only motif 3 from data set 6a and motif 7 from data set 5a -both in 3′UTRs- were selected for further studies. 3′-UTRs of sequences from complete MLP29 transcriptomics data were searched for motif 3 from the data set 6a and motif 7 from the data set 5a. Next, all known mouse 3′-UTR RNA sequences and all known mouse transcripts were scanned for the occurrences of the aforementioned motifs. Only motif 3 was positively validated based on a strong positive correlation between the number of motif occurrences in transcripts and foldchange values obtained from the microarray experiment (Fig. 2). In the case of motif 7 there was no correlation between the number of its occurrences in transcripts and foldchange values (Fig. 2). For both motifs plots visualizing the correlation between the number of motif occurrences in transcripts and intensity showed that there was no correlation between these parameters (Fig. 3). The results suggest that, regardless of weak conservation of motif 3, it might play a role in RNA transport to EVs, as frequent occurrence of this motif correlates with higher negative foldchange values. This conclusion is supported by the results of the miRNA analysis. Binding sites for MLP29 miRNAs have been found for motif 3 and have not been found for motif 7. On the other hand, the results of the analysis do not suggest that motif 7 might play a role in RNA transport to EVs, even though this motif seems to be well conserved.

Figure 2. Correlation analysis between the number of motif occurrences in transcript and foldchange values obtained from the microarray experiment.

Figure 3. Correlation analysis between the number of motif occurrences in transcript and intensity values obtained from the microarray experiment.
Design of the in vitro experiment
Results obtained from the statistical validation allowed choosing the best candidates for wet lab experiment. We decided to keep motif 7 as a control. For wet lab experiment we choose the 3′-UTR of the Cst6, Net1 and ActB genes. Cst6 was the second most enriched mRNA sequence in the EVs. Its 3′-UTR contained many instances of motif 3, but since that motif did not seem to be well conserved, we decided to clone the whole 3′-UTR. Furthermore, there were miRNAs predicted to bind to some of these motif variants according to miRanda results. The 3′-UTR of Net1 contained only one instance of motif 3 and would be useful to confirm whether only one instance of that motif was sufficient for transporting its carrier to EVs. The 3′-UTR of ActB was chosen as a control. Despite the fact that this gene's mRNA was enriched in the EVs, the analysis showed that its 3′-UTR did not contain motif 3, but it contained motif 7 which did not seem to play a role in RNA transport into EVs according to statistical validation results.
Validation in a cellular system
In order to validate motif 3 from data set 6a and motif 7 from data set 5a, we replaced the 3′-UTR of luciferase gene in the pcDNA 3 Luc SV5 (Invitrogen) vector by the 3′-UTR of transcripts containing the chosen motifs as described in Materials and Methods section. After transfection of the plasmids into MLP29 cells, the presence of the luciferase transcript was examined in cells and in the EVs released by these cells. In Figure 4 the LUC/NPTII ratio of foldchanges for each construct in cells and EVs is plotted. The graph represents the average of three independent experiments +/− SEM. The negative values denote that NPTII transcripts were more abundant than luciferase in the RNA extracted from cells transfected with modified vectors, when compared with the proportion observed in cells transfected with the unmodified vector. However, the opposite is true for the RNA extracted from EVs, that is, luciferase is more abundant than NPTII in the EVs. The interpretation of these results is that luciferase transcripts are exported out of the cells into EVs more efficiently if they include the 3′-UTR containing motif 3 (the tail of Cst6 or Net1). Incorporation of the 3′-UTR containing motif 7 (the tail of ActB) into luciferase transcripts does not lead to the export of luciferase into EVs, as expected. The normalization with NPTII gene allows avoiding false results due to differences in the efficiency of transfection. The opposite tendencies noticed in cells and EVs fractions proved that we did not observe an artifact due to the different efficiency of the transcription after the tail incorporation.

Figure 4. In vitro analysis.
Discussion
The study of EVs as mediators of physiological and pathological processes,30 as therapeutics agents,31 and disease biomarkers32 has evolved rapidly in the last few years. The complexity of their bioactive cargo including proteins, RNA, microRNA and DNA, indicates different stages and prolonged mechanisms by which these vesicles eject their functions. An important step to unravel these functions is to elucidate the responsible mechanisms for targeting RNAs to EVs. Despite the intense research performed within this field in the recent years, the molecular mechanisms by which genetic materials are uploaded into EVs is still mostly unknown. Bolukbasi et al.11 have shown that there is a zipcode-like 25-nt sequence which contains a short “CTGCC” core domain on a stem-loop structure and carries a miR-1289 binding site in the 3′-UTRs of many of the most enriched mRNAs in EVs derived from primary glioblastoma cells as well as melanoma cells. They have also shown that miR-1289 binds directly to this zipcode and participates in the incorporation of mRNAs into EVs. Also in glioblastoma cells Batagov et al.12 suggest that other three motifs could be involved in the targeting of mRNAs into EVs. We have examined the presence of these motifs in mRNAs that were enriched in EVs released by a non-tumoral liver-derived cells and we could not find any significant correlation suggesting that in hepatic cells different motifs could be responsible for targeting mRNA into EVs. In addition, our results suggest that the mechanism of RNA transport into EVs is tissue-specific. Therefore, subsequent extensive investigation in different cellular systems is required to identify the mechanisms involved in the EV-sorting of mRNAs.
The bioinformatics analysis has allowed us to detect 12 new putative motifs that could be involved in targeting mRNAs into EVs, in hepatic cellular systems. Ten out of these motifs were located in the coding region of the transcripts, and 2 others were located in the 3′-UTR of the transcripts. We have focused on the validation of these later motifs given that most of the post-transcriptional signals described so far are located in 3′-UTR regions. We have assayed the effect on the EV abundance of luciferase mRNA after the incorporation of the 3′-UTR of genes Net1, Cst6 and ActB. All of these transcripts were enriched in hepatic EVs. Net1 and Cst6 contain 1 and 5 copies of the motif 3, respectively. In the case of ActB its 3′-UTR did not contain that motif, but instead it included the motif 7 that in silico was not statistically validated. Therefore the tail of ActB served as a negative control. While the 3′-UTRs of Net1 and Cst6 were able to increase the abundance of luciferase mRNA in EVs, the 3′-UTR of ActB was not able to incorporate luciferase mRNA into EVs. These results indicate that: first, the statistically validated motif may play a role in RNA transport into EVs, second - in the mRNA of ActB another sequence or combinations of sequences is required to transfer it into EVs, third, the sequence is not located within 3′-UTR of ActB. Further, it has been shown a link between miRNA function and the localization of mRNAs into cytoplasm loci involved in mRNA degradation.33,34 Recently, microRNAs has also been shown to mediate, at least in some cellular systems, the transport of specific mRNAs into EVs.11 These studies support the implication of microRNAs in localization/sorting of mRNAs into the different cytoplasmic areas including regions involved in the formation of EVs. We have investigated whether our motifs are putative targets for microRNAs. Binding sites for MLP29 miRNAs have been found for most of our predicted motifs, particularly, the motif that was confirmed experimentally. This results supports the potential role of miRNAs in transporting mRNA into hepatic EVs.
In conclusion, in this study we described 12 new putative motifs that might be involved in the localization of mRNAs into EVs of hepatic origin. Some of them are part of binding-site for microRNAs secreted by the non-tumoral hepatic model. We have also shown that it is possible to target a specific mRNA into EVs in a hepatic model by using the 3′-UTR sequence containing one of these motifs, which can have important therapeutics implications.
Materials and Methods
In silico analysis of data
Figure 5 presents an overview of the bioinformatics analysis.
Figure 5. Workflow of the in silico the study.
Data source
Transcriptomics data from liver-derived MLP29 cells, and EVs secreted by this cell line under regular conditions were obtained using Illumina Expression microarray.20 Since the RNA was extracted from EVs purified by filtration through 0.22 microns followed by differential ultracentrifugation, it is expected an enrichment in exosomes. When these EVs preparations were analyzed by density gradient, it was observed that transcripts were distributed in fractions corresponding to the density of 1.1 g/ml till 1.18 g/ml, and they correlated with some vesicle markers as CD81, AIP1 and Flotillin.20 By comparison of the cellular and EV contents of the detected RNAs two data sets were generated from this data: data set a (RNAs enriched in EVs) and data set b (RNAs underrepresented in EVs). RNA sequences (including 3′-UTRs and 5′-UTRs) for the transcriptomics data were downloaded in the FASTA format from the NCBI database.22 All known mouse transcripts were also downloaded from this repository. All known mouse 3′-UTR RNA sequences were obtained from the UCSC Genome Browser35 website via the Table Browser.36 All known mouse miRNAs were downloaded from MIRbase archives.37
Sequence masking and data set preparation
To avoid high number of false positives, the data sets containing masked sequences were created out of data sets a and b using DUST21 and Repeat Masker (Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. 1996–2010. [http://www.repeatmasker.org]). Low complexity regions of RNA sequences were masked with DUST. Repeat Masker was used to deal with simple repeats, full-length ALUs, full-length interspersed repeats, remaining ALUs, short interspersed repeats, long interspersed repeats, MIR and LINE2, retroviral sequences, LINE1s and simple repeats. The data sets that were obtained and used in the motif search phase are shown in Table 1.
Multiple alignment
In order to find similar motifs in sequence sets, multiple sequence alignment (MSA) were run using ClustalW2,38 ClustalOmega,38,39 kalign,23,24 T-Coffee,40 Muscle25 and Maftt26,27 algorithms. All of them were performed with default parameter values.
Local alignment
Local alignment was based on BLAST algorithm.21 Its aim was to obtain each vs. each sequence alignment. Therefore, blastn was run for every sequence set separately so that the same set was used as a query and as a database. Blastn parameters were set to obtain short local alignments: word_size = 15, evalue = 10, max_target_seqs = 100, gapopen = 5, gapextend = 2, penalty = -3, -reward = 1.
Multiple EM for Motif Elicitation (MEME)
It was assumed that the motif could occur in one sequence zero or multiple times therefore TCM model (parameter mod = anr) was chosen. TCM assumes that there are zero or more non-overlapping occurrences of the motif in each sequence in a given data set. Maximum motif width was set to 15 and minimum set to 5 (shorter motif length could produce too many false positives).
MEME was used with the following parameter values: -dna, -mod anr, -nmotifs 30, -evt 10, -minsites 5, -maxsites 300, -minw 5, -maxw 15, -maxsize 300000.
Prediction of RNA secondary structures and miRNA binding sites
Each RNA sequence containing at least one of the 12 conserved motifs was selected for further analysis. The sequences spanning 100 nts upstream and 100 nts downstream from the motif were used for secondary structure prediction by mfold run with default parameters.28 The predicted structures with the lowest Gibbs energy were visually inspected and the predominant (most frequently occurring) structures were identified.
Simultaneously, miRNA binding site prediction for all 12 motifs was performed using miRanda 3.3a.29 All known mouse miRNAs were scanned against 12 candidate motifs. Quiet mode was used to output fewer event notification, the other parameters had default values.
miRNAs search
miRNAs predicted to bind one of the 12 putative motifs in noncoding regions were investigated using PubMed and Vesiclepedia databases. In addition, we experimentally identified the miRNAs released by the MLP29 cell line used in the in vitro validation. To profiling this cell line, three independent cell culture supernatant (500uL) of this cell line were prepared as described below, and miRNA was extracted using 3D-GeneTM miRNA extraction reagent from liquid sample kit (Toray Industries) following the manufacture’s protocol. We performed a miRNA profiling using microarray, 3D-GeneTM miRNA oligo chip v.16 (Toray Industries) according to the manufacturer ’s protocol vE1.10. The number of mounted genes on the microarray is 1,212 in total. Microarray was scanned and the obtained images were numerated using 3D-Gene® scanner 3000 (Toray Industries). The expression level of each miRNA was globally normalized using the background-subtracted signal intensity of the entire genes in each microarray.
Statistical validation
Statistical analysis of the most promising motifs was based on the number of transcripts containing a given motif, with the reference to all mouse transcriptomic data. The enrichment value in this case was calculated according to the formula presented below:
enrich1 = (# of transcripts containing a given motif found in a data set / total # of transcripts in mouse)
The number of the particular motif occurrences per transcript was also taken into account and the enrichment value was calculated as follows:
enrich2 = (# of the particular motif occurrences found in a data set / total # of transcripts in mouse)
Furthermore, in the case of motif 3 from data set 6a, motif 7 from data set 5a, as well as motifs identified Batagov et al.12 and Bolukbasi et al.,11 we prepared plots visualizing the correlation between the number of motif occurrences in transcript and foldchange values (Fig. 2), and plots visualizing the number of motif occurrences in transcript and intensity values obtained from the microarray experiment (Fig. 3).
In vitro validation
Generation of constructs
3′-UTRs s of selected genes containing the motif were used to replace the 3′-UTR of luciferase gene. Constructs modifying the commercial vector pcDNA 3 Luc SV5 (Invitrogen) were generated. They were used to express genes coding for luciferase and neomycin phosphotransferase II proteins in mammalian cells. The latter was responsible for the resistance to the antibiotic G418. The original vector was digested with EcoRI (Fermentas) in order to insert an EcoRI-digested PCR product of the 3′-UTR of ActB, Cst6 or Net1 transcripts (primers listed In Table S1) at the 3′ end of luciferase (LUC) gene. After ligation with T4 Ligase (Invitrogen), the constructs were amplified in E.coli DH5a competent cells (Agilent). Then plasmids were extracted with Hispeed Plasmid Midi kit (Qiagen) and sequenced to confirm the presence of the desired constructs.
Cell transfection and extracellular vesicles (EVs) isolation
MLP29 cells were seeded in 90 mm dishes (5x10E5 cells/dish) and transfected using X-treme Gene HP DNA transfection reagent (Roche). Six ug of plasmid DNA per dish and 12 ul of transfection reagent were used to the transfection. After 24 h, cells were washed, trypsinized and seeded in a 150 mm dish, in DMEM with 10% of FBS (previously depleted of vesicles from Bovine origin by ultracentrifugation). After 48 h, the cells were collected, media centrifuged at 2,000 xg per 10 min, and supernatant filtered through a 0.22 um filter. For each construct, 1 ml of filtered media obtained in the previous step was incubated overnight, with 500 ul of Total Exosome Isolation (Invitrogen). Then centrifuged 1 h at 10,000 xg and supernatant discharged. The pellets were resuspended in 100 ul of PBS. Experiments were performed in triplicate.
RNA extraction and generation of cDNA
Both cell pellets and EVs were processed with RNeasy kit (Quiagen) for RNA extraction, including a DNase digestion step in the column, as recommended by manufacturers. For cDNA synthesis, Superscript III (invitrogen) reactions were set with 0.5 ug of RNA for cells, and with 10 ul (of a total of 30 ul) of eluted RNA for EVs.
Quantitative PCR (qPCR)
Reactions were performed with Quanta (Perfecta) SYBR green reagent, for Rplp0, luciferase (LUC) and Neomycin phosphotranferase (NPTII) with the primers listed in Table S1. An analysis of the results was performed calculating the foldchange, according to Equation 2ddCT, where dd = -((CtRplp0Vector-CtRplp0Construct) - (CtGeneVector-CtGeneConstruct). The efficiency of the primers was calculated by serial DNA dilution, according to CFX Manager software (BIORAD) giving values within the range of E = 100% +/− 5.
Statistical analysis
To plot the data, a ratio between foldchanges was calculated, as Ln(Foldchange LUC/Foldchange NPTII) for each construct. One-sample t test was employed to calculate the probability of the ratio obtained with each treatment being equal to 0(* P < 0.1, ** P < 0.05). In this case, 0 represents the value for the unmodified vector.
Supplementary Material
Acknowledgment and Funding
We thank Dr. E. Medico for providing the MLP29 cell line. This work has been supported by grants (PS09/00526 and PI12–01604 to JMFP) from Spanish Ministry MICINN integrated in the National plan I+D+I and cofunded by the ISCIII-Subdirección General de Evaluación and the European Fund for Regional Development (Feder); Program “Ramon y Cajal” of Spanish Ministry (to JMFP), by Basque Government grant (GV PI2012–45 to JMFP) and by an award from the Movember GAP1 Exosome Biomarker study to JMF. Centro de Investigación Biomédica en Red en el Área temática de Enfermedades Hepáticas y Digestivas (CIBERehd) is funded by the Spanish ISCIII-MICINN. N.S., A.R. and M.S. have been partially supported by the National Science Centre, Poland [2012/05/B/ST6/03026].
References
- 1.Gutiérrez-Vázquez C, Villarroya-Beltri C, Mittelbrunn M, Sánchez-Madrid F. Transfer of extracellular vesicles during immune cell-cell interactions. Immunol Rev. 2013;251:125–42. doi: 10.1111/imr.12013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Mathivanan S, Ji H, Simpson RJ. Exosomes: extracellular organelles important in intercellular communication. J Proteomics. 2010;73:1907–20. doi: 10.1016/j.jprot.2010.06.006. [DOI] [PubMed] [Google Scholar]
- 3.Ohno S, Ishikawa A, Kuroda M. Roles of exosomes and microvesicles in disease pathogenesis. Adv Drug Deliv Rev. 2013;65:398–401. doi: 10.1016/j.addr.2012.07.019. [DOI] [PubMed] [Google Scholar]
- 4.Simons M, Raposo G. Exosomes--vesicular carriers for intercellular communication. Curr Opin Cell Biol. 2009;21:575–81. doi: 10.1016/j.ceb.2009.03.007. [DOI] [PubMed] [Google Scholar]
- 5.Cocucci E, Racchetti G, Meldolesi J. Shedding microvesicles: artefacts no more. Trends Cell Biol. 2009;19:43–51. doi: 10.1016/j.tcb.2008.11.003. [DOI] [PubMed] [Google Scholar]
- 6.Blazewicz J, Figlerowicz M, Kasprzak M, Nowacka M, Rybarczyk A. RNA partial degradation problem: motivation, complexity, algorithm. J Comput Biol. 2011;18:821–34. doi: 10.1089/cmb.2010.0153. [DOI] [PubMed] [Google Scholar]
- 7.Kalra H, Simpson RJ, Ji H, Aikawa E, Altevogt P, Askenase P, Bond VC, Borràs FE, Breakefield X, Budnik V, et al. Vesiclepedia: a compendium for extracellular vesicles with continuous community annotation. PLoS Biol. 2012;10:e1001450. doi: 10.1371/journal.pbio.1001450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Simpson RJ, Jensen SS, Lim JW. Proteomic profiling of exosomes: current perspectives. Proteomics. 2008;8:4083–99. doi: 10.1002/pmic.200800109. [DOI] [PubMed] [Google Scholar]
- 9.Jansen RP. mRNA localization: message on the move. Nat Rev Mol Cell Biol. 2001;2:247–56. doi: 10.1038/35067016. [DOI] [PubMed] [Google Scholar]
- 10.Martin KC, Ephrussi A. mRNA localization: gene expression in the spatial dimension. Cell. 2009;136:719–30. doi: 10.1016/j.cell.2009.01.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bolukbasi MF, Mizrak A, Ozdener GB, Madlener S, Ströbel T, Erkan EP, Fan JB, Breakefield XO, Saydam O. miR-1289 and “Zipcode”-like Sequence Enrich mRNAs in Microvesicles. Mol Ther Nucleic Acids. 2012;1:e10. doi: 10.1038/mtna.2011.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Batagov AO, Kuznetsov VA, Kurochkin IV. Identification of nucleotide patterns enriched in secreted RNAs as putative cis-acting elements targeting them to exosome nano-vesicles. BMC Genomics. 2011;12(Suppl 3):S18. doi: 10.1186/1471-2164-12-S3-S18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wei W, Yu XD. Comparative analysis of regulatory motif discovery tools for transcription factor binding sites. Genomics Proteomics Bioinformatics. 2007;5:131–42. doi: 10.1016/S1672-0229(07)60023-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Das MK, Dai HK. A survey of DNA motif finding algorithms. BMC Bioinformatics. 2007;8(Suppl 7):S21. doi: 10.1186/1471-2105-8-S7-S21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Popenda M, Blazewicz M, Szachniuk M, Adamiak RW. RNA FRABASE version 1.0: an engine with a database to search for the three-dimensional fragments within RNA structures. Nucleic Acids Res. 2008;36:D386–91. doi: 10.1093/nar/gkm786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006;34:W369-73. doi: 10.1093/nar/gkl198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Pavesi G, Mauri G, Pesole G. An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics. 2001;17(Suppl 1):S207–14. doi: 10.1093/bioinformatics/17.suppl_1.S207. [DOI] [PubMed] [Google Scholar]
- 18.Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993;262:208–14. doi: 10.1126/science.8211139. [DOI] [PubMed] [Google Scholar]
- 19.Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005;23:137–44. doi: 10.1038/nbt1053. [DOI] [PubMed] [Google Scholar]
- 20.Royo F, Schlangen K, Palomo L, Gonzalez E, Conde-Vancells J, Berisa A, Aransay AM, Falcon-Perez JM. Transcriptome of extracellular vesicles released by hepatocytes. PLoS One. 2013;8:e68693. doi: 10.1371/journal.pone.0068693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 22.Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–5. doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lassmann T, Frings O, Sonnhammer EL. Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acids Res. 2009;37:858–65. doi: 10.1093/nar/gkn1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lassmann T, Sonnhammer EL. Kalign--an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics. 2005;6:298. doi: 10.1186/1471-2105-6-298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–66. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–15. doi: 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.John B, Enright AJ, Aravin A, Tuschl T, Sander C, Marks DS. Human MicroRNA targets. PLoS Biol. 2004;2:e363. doi: 10.1371/journal.pbio.0020363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Peinado H, Alečković M, Lavotshkin S, Matei I, Costa-Silva B, Moreno-Bueno G, Hergueta-Redondo M, Williams C, García-Santos G, Ghajar C, et al. Melanoma exosomes educate bone marrow progenitor cells toward a pro-metastatic phenotype through MET. Nat Med. 2012;18:883–91. doi: 10.1038/nm.2753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.El-Andaloussi S, Lee Y, Lakhal-Littleton S, Li J, Seow Y, Gardiner C, Alvarez-Erviti L, Sargent IL, Wood MJ. Exosome-mediated delivery of siRNA in vitro and in vivo. Nat Protoc. 2012;7:2112–26. doi: 10.1038/nprot.2012.131. [DOI] [PubMed] [Google Scholar]
- 32.Duijvesz D, Luider T, Bangma CH, Jenster G. Exosomes as biomarker treasure chests for prostate cancer. Eur Urol. 2011;59:823–31. doi: 10.1016/j.eururo.2010.12.031. [DOI] [PubMed] [Google Scholar]
- 33.Liu J, Valencia-Sanchez MA, Hannon GJ, Parker R. MicroRNA-dependent localization of targeted mRNAs to mammalian P-bodies. Nat Cell Biol. 2005;7:719–23. doi: 10.1038/ncb1274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Nowacka M, Jackowiak P, Rybarczyk A, Magacz T, Strozycki PM, Barciszewski J, Figlerowicz M. 2D-PAGE as an effective method of RNA degradome analysis. Mol Biol Rep. 2012;39:139–46. doi: 10.1007/s11033-011-0718-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–6. doi: 10.1093/nar/gkh103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152–7. doi: 10.1093/nar/gkq1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011;7:539. doi: 10.1038/msb.2011.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Goujon M, McWilliam H, Li W, Valentin F, Squizzato S, Paern J, Lopez R. A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 2010;38:W695-9. doi: 10.1093/nar/gkq313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Di Tommaso P, Moretti S, Xenarios I, Orobitg M, Montanyola A, Chang JM, Taly JF, Notredame C. T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res. 2011;39:W13-7. doi: 10.1093/nar/gkr245. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

