Skip to main content
. 2021 Jul 6;12:4138. doi: 10.1038/s41467-021-24436-7

Fig. 2. Design of the 5′ UTR library of naturally occurring and synthetic 5′ UTRs.

Fig. 2

RNA-seq and Ribo-seq datasets of HEK 293T, PC3, and human muscle cells, together with the GTEx database of human muscle tissue, were collected. Natural 5′ UTRs with high TEs and low TEs in HEK 293T and RD cells, 5′ UTRs with various TEs in human muscle cells, and the 5′ UTRs with high mRNA counts in human muscle tissues were selected and added to the library. In addition, we designed synthetic 5′ UTRs by: (i) collecting endogenous 5′ UTR sequences on the target cell type (HEK 293T, PC3 or human muscle cells) from public data; (ii) extracting sequence features of the 5′ UTRs, including those nucleotides surrounding the AUG region; (iii) training a Random Forest machine learning method for each cell type/tissue (HEK 293T, PC3 or human muscle cells), to learn a function that maps sequence features to mRNA expression levels and TEs; and (iv) designing a set of 100 bp synthetic sequences that are predicted to maximize TEs and protein expression levels using genetic algorithms.