Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Jan 27.
Published in final edited form as: Chemistry. 2024 Dec 30;31(6):e202403718. doi: 10.1002/chem.202403718

Site-Specific Incorporation of Fluorinated Prolines into Proteins and their Impact on Neighbouring Residues

Carlos A Elena-Real a,#, Annika Urbanek a,#, Amin Sagar a, Priyesh Mohanty b, Geraldine Levy c,d, Anna Morató a, Aurélie Fournet a, Frédéric Allemand a, Nathalie Sibille a, Jeetain Mittal b,e,f, Davy Sinnaeve c,d,*, Pau Bernadó a,*
PMCID: PMC11772113  NIHMSID: NIHMS2044037  PMID: 39661394

Abstract

The incorporation of fluorinated amino acids into proteins provides new opportunities to study biomolecular structure-function relationships in an elegant manner. The available strategies to incorporate the majority of fluorinated amino acids are not site-specific or imply important structural modifications. Here, we present a chemical biology approach for the site-specific incorporation of three commercially available Cγ-modified fluoroprolines that has been validated using a non-pathogenic version of huntingtin exon-1 (HttExon-1). 19F, 1H and 15N NMR chemical shifts measured for multiple variants of HttExon-1 indicated that the trans/cis ratio was strongly dependent on the fluoroproline variant and the sequence context. By isotopically labelling the rest of the protein, we have shown that the extent of spectroscopic perturbations to the neighbouring residues depends on the number of fluorine atoms and the stereochemistry at Cγ, as well as the isomeric form of the fluoroproline. We have rationalized these observations by means of extensive molecular dynamics simulations, indicating that the observed atomic chemical shift perturbations correlate with the distance to fluorine atoms and that the effect remains very local. These results validate the site-specific incorporation of fluoroprolines as an excellent strategy to monitor intra- and intermolecular interactions in disordered proline-rich proteins.

Summary

graphic file with name nihms-2044037-f0001.jpg

Elena-Real et al. present a new approach for the site-specific incorporation of fluoroprolines in proteins. Using NMR, they show that the trans/cis ratio depends on the fluoroproline type and the sequence context, and that the spectroscopic perturbations on neighbouring atoms correlate with their distance to fluorine. These results validate this site-specific incorporation as an excellent strategy to monitor protein structure.

Introduction

The incorporation of non-canonical amino acids (ncAAs) into proteins is a powerful tool for studying protein structure, dynamics and function[1-3]. Moreover, ncAAs expand the chemical diversity that is available in nature, incorporating novel and unique biophysical or functional properties in native proteins[4,5]. For instance, ncAA can be used to (de)stabilize particular secondary structures or to disrupt specific interactions within proteins[6,7]. The incorporation of chemical probes to monitor the local structure and dynamics of proteins remains one of the most common applications of ncAAs. Amino acids carrying stable radicals or cyano groups have been used to measure intramolecular distances by nuclear magnetic resonance (NMR) and electronic paramagnetic resonance and to monitor protein motions by infrared spectroscopy[8-10]. Among these probes, fluorine has become very popular due to its unique properties, especially for NMR experiments[11-13]. 19F, the only natural isotope, is almost non-existent in biological samples and provides very clean NMR spectra[14]. Moreover, its large gyromagnetic ratio and extremely broad chemical shift dispersion make 19F-NMR a highly sensitive tool to probe protein structure and dynamics at high resolution and for pharmacological applications[12,15,16]. However, the quantitative interpretation of the resulting 19F chemical shifts is challenging as theoretical models, including quantum chemistry calculations, fail in providing accurate values[17,18].

Fluorine atoms can be incorporated into proteins using several strategies that can be global or site-specific[16,19]. An example of a global strategy is the post-translational incorporation of fluorine. For this, highly reactive fluorinated moieties, such as –CF3, are covalently anchored to solvent exposed cysteines or lysines[20-22]. In order to achieve the systematic incorporation of a given fluorinated amino acid, some authors add the 19F-modified form of the amino acid or a suitable precursor into the cell culture[23-25]. However, this approach can impact the expression yield and cannot be applied to all natural amino acids. The use of auxotrophic Escherichia coli strains and the optimization of the expression conditions can facilitate a systematic amino acid incorporation[24,26,27]. For most of the cases, however, site-specific incorporation of fluorinated amino acids is desired. Solid-phase peptide synthesis is the most straightforward method to introduce fluorinated amino acids in a site-specific manner[26]. The advantage of this approach is its flexibility in selecting the fluorinated amino acid, which can be chemically similar to the natural one. However, the size of the systems amenable to solid-phase peptide synthesis is severely limited, unless a subsequent chemical ligation is performed[27]. A more elegant way to selectively introduce fluorinated amino acids into proteins is using an expanded genetic code via tRNA nonsense suppression[1-3]. In this strategy, the protein is overexpressed together with an engineered tRNACUA and its associated aminoacyl tRNA synthetase (aaRS) in the presence of the fluorinated amino acid. In the cell or in a cell-free (CF) reaction mixture, the aaRS aminoacylates the tRNACUA with the amino acid and is subsequently recognized by the ribosome and introduced in a position specific manner defined by the presence of an amber stop codon in the coding gene. However, in the majority of cases, the fluorinated amino acid must host important chemical differences to avoid recognition by the endogenous aaRS[28-30], precluding the usage of quasi-natural amino acids in which one or few hydrogen atoms are exchanged by fluorine. Interestingly, singly fluorinated phenylalanine and tryptophan have been genetically incorporated into proteins using engineered aaRS[31-35].

The availability of multiple commercial fluoroprolines, particularly those substituted at positions 3 (Cβ) and 4 (Cγ), and their Fmoc/Boc derivatives has prompted intense research on the conformational effects of these ncAAs[13,27,36-45]. The presence of an electron-withdrawing fluorine atom in prolines, especially in Cγ, strongly influences the ring puckering, with the S- and R-stereochemistry favouring the Cγ-endo and Cγ-exo pucker, respectively (Fig. 1). As a consequence of the stabilizing n→π* interactions between consecutive carbonyl groups, the Cγ exo pucker, which is enriched in (2S,4R)-fluoroproline (4R-FPro), leads to an enhanced preference for the trans rotamer[40,46]. In contrast, the (2S,4S)-fluoroproline (4S-FPro), which is biased towards the endo pucker, presents an elevated cis population relative to regular proline. This scenario is similar to that found in singly fluorinated RNA nucleotides, where fluorine in the C2’ ara or ribo positions induces a C2’-endo or C3-endo pucker, respectively[47]. In oligoprolines, the uniform substitution of proline by 4R-FPro results in a stabilization of the polyproline-II (PPII) conformation, while 4S-FPro destabilizes this secondary structure[48]. The impact of single substitutions of 4R-FPro and 4S-FPro residues in oligoprolines has been recently studied with NMR spectroscopy[49]. Based on 13C chemical shifts, it was concluded that the impact on the PPII structure was minimal, likely only featuring some small perturbations on the neighboring residues due to preferred ψ dihedral angles of the FPro residue. Interestingly, when two fluorine atoms are introduced in Cγ ((2S)-4,4-difluoroproline, 4,4-diFPro), the above-mentioned structural effects are counteracted and its conformational behaviour is similar to that of the native proline[37,43,50,51]. Indeed, a recent study suggested that the structural impact of a single 4,4-diFPro substitution within oligoprolines was minimal[52].

Figure 1. Conformational preferences of the studied fluoroprolines.

Figure 1.

(A) Chemical structure of the three fluoroprolines (4S-FPro, blue; 4R-FPro, red; and 4,4-diFPro, purple) used in this study. The colour code is used throughout the study. Their conformational preferences trans/cis and endo/exo with respect to the canonical prolines are displayed below. (B) Cartoon of the endo/exo conformations found for the trans isomer of prolines where the position of the proS and proR Hγ are indicated. The pseudo-axial positions (proS for Cγ-endo and proR for Cγ-exo) are indicated with an *. The conformational equilibrium is biased for 4S-FPro (towards endo) and 4R-FPro (towards exo).

Due to the tunability of the structural properties of fluoroprolines, these amino acids have become very popular tools in chemical biology to modulate protein and peptide stability and improve folding kinetics[39,43-45,53]. In addition to solid-phase peptide synthesis, protein semi-synthesis and native chemical ligation[27], fluoroprolines have also been incorporated into recombinant proteins using auxotrophic E. coli strains, resulting in a global replacement of native proline by one of its fluorinated counterparts[54]. These studies have shown that fluorinated prolines can alter the stability[55,56], the folding mechanism[55,57,58], the fluorescent properties[59] or the enzymatic activity[60,61] of proteins. However, this approach does not allow the site-specific investigation of the perturbations exerted by these ncAAs.

We have recently developed a strategy that enables the incorporation of [15N,13C]-prolines in specific positions in proteins[62]. This approach, which we named Site-Specific Isotopic Labelling (SSIL)[63], uses the previously mentioned genetic code expansion approach in a CF protein synthesis scheme. The ability to specifically incorporate proline isotopologues in a defined position relies on the external aminoacylation of the tRNACUA by an orthogonal proline tRNA synthetase (ProRS), which is subsequently added to the reaction mixture. In the present study, we show that the same ProRS derived from Pyrococcus horikoshii can be used to load the tRNACUA with three commercially available prolines fluorinated in Cγ and selectively incorporate them into proteins to study their conformational specificities and evaluate the structural perturbation on neighbouring residues. Concretely, we have incorporated 4R-FPro, 4S-FPro and 4,4-diFPro, which are the prototypical examples of commercially available fluorinated prolines presenting well defined conformational properties (Fig. 1A)[39,43]. In order to illustrate our methodology, we have used a non-pathogenic construct of huntingtin exon-1 (HttExon-1), the causing agent of Huntington’s disease. The HttExon-1 construct used encompasses a 16 glutamine-long tract (poly-Q) immediately followed by a polyproline (poly-P) tract of 11 prolines that structurally influences poly-Q (Fig. 2A)[64,65]. The incorporation of the three fluoroprolines at two positions of the poly-P has enabled the investigation of the perturbation of the trans/cis equilibrium exerted by these ncAAs in the context of a repetitive protein. Moreover, we have quantitatively explored the influence of these modified amino acids on the neighbouring glutamines and, using extensive molecular dynamics simulations, we have shown that the chemical shift changes exerted by fluoroprolines are distance dependent, but their effect is limited to very short distances. Overall, our study shows that, using our methodology, commercial fluoroprolines can be site-specifically incorporated in large protein constructs alongside with isotopic labelling of other amino acids, widening their already established scope as interesting probes for structural and functional studies.

Figure 2. Incorporation of the three fluoroprolines into H16.

Figure 2.

(A) Fragment of H16 in which modifications have been performed. The three fluoroprolines have been incorporated in positions Pro34 or Pro38 (green). Gln29 or Gln33 (orange) have been isotopically labelled to monitor the structural and spectroscopic changes induced when incorporating fluoroprolines in position Pro34. (B) Urea-PAGE gels displaying the bands corresponding to the empty and loaded tRNACUA after aminoacylation with P. horikoshii ProRS. (C) Endpoint fluorescence intensity plot for CF expression of H16-P34 with 10 and 20 μM of tRNACUA loaded with proline (green), 4S-FPro (blue), 4R-FPro (red) and 4,4-diFPro. + and − indicate the positive (wild-type H16) and negative (H16-P34 without tRNACUA) controls, respectively.

Results

Efficient site-specific introduction of fluoroprolines in HttExon-1

In a previous study, we demonstrated the use of an orthogonal P. horikoshii ProRS/tRNACUA pair[66] in CF to site-specifically incorporate isotopically labelled prolines into proteins in CF[62]. In this procedure, the tRNACUA was aminoacylated in vitro by mixing it with the ProRS in the presence of proline and ATP at 37°C. In the present study, we first tested whether the same procedure could be applied to aminoacylate the tRNACUA with the three fluorinated prolines (Fig. 1A). The reaction conditions (concentrations, temperature and reaction time) for the three ncAA were explored, but no differences with respect to those previously established for canonical proline were found (see methods section). Urea-PAGE gels after aminoacylation with proline and the three fluoroprolines displayed two bands, corresponding to the loaded and empty tRNACUA (Fig. 2B). Although a precise quantification was not performed, the band corresponding to the aminoacylated tRNACUA was more intense than the empty one in all the cases, indicating that the presence of fluorine atoms in the amino acid does not preclude the enzymatic activity of the ProRS.

Then, we proceeded to evaluate the capacity of fluorinated prolines to be site-specifically incorporated into proteins using a CF approach. For this, we used a previously described HttExon-1 construct fused to the N-terminus of superfolder green fluorescent protein (sfGFP), hereafter called H16[63]. In order to evaluate our procedure, the codon corresponding to Pro34, the first proline of the poly-P tract, was switched to an amber stop codon (TAG), yielding H16-P34. The addition of 10 and 20 μM of the tRNACUA preparation to the CF reaction resulted in fluorescence signal for the native and the three fluorinated prolines, indicating the efficient synthesis of the protein with the desired amino acid (Fig. 2C). Similarly to previous studies of our group, a yield of 20-40% with respect to the plasmid in the absence of the stop codon was observed. Furthermore, a slight decrease in the H16-P34 production was noticed when increasing the loaded tRNACUA concentration to 20μM[62,63,67]. Interestingly, the production of the proteins in the presence of tRNACUA loaded with fluorinated prolines was slightly more efficient than when loaded with the native proline. This observation is counterintuitive as a natural tRNA/proRS pair has been used (with the exception of the tRNA anticodon), which should favour loading of canonical prolines with respect to the non-canonical ones. The similarity of the loading capacity suggests that the chemical perturbation exerted by fluorine atoms is minimal. We have not further explored this observation due to the limited accuracy of the quantification by PAGE gels, and the reduced sensitivity of protein production to the percentage of loaded tRNA.

A 19F-NMR perspective of fluorinated prolines in HttExon-1

The 1D 19F-NMR spectra featuring 4R-FPro and 4S-FPro at position Pro34 revealed the presence of a major and a minor form in slow exchange, which we attributed to the presence of the trans and cis conformations of this residue, respectively (Fig. 3A,B). The introduction of 4,4-diFPro in the same position clearly delivered the trans-form signals, but unfortunately the signal-to-noise ratio was insufficient to confidently identify the cis-form signals (Fig. 3C). The deconvolution of the 19F-NMR signals of the fluorinated constructs indicated an important difference in the cis content for 4R-FPro and 4S-FPro with 9.1 ± 4% and 17.5% ± 1%, respectively. Note that the cis population for canonical proline in Pro34 was previously reported to be between 12% and 16% using SSIL samples[62,65]. The 19F-NMR experiments showed that 4R-FPro promotes the population of the trans form, while 4S-FPro increases the cis form. These observations are in line with previously reported trans/cis ratios for Ac-(F)Pro-NMe2 model compounds[68].

Figure 3. 19F-NMR spectra of fluoroprolines incorporated in H16.

Figure 3.

Spectra of 4S-FPro (blue), 4R-FPro (red) and 4,4-diFPro (purple) incorporated in positions Pro34 (A-C) and Pro38 (D-F). Signals assigned to trans and cis configurations are indicated.

Substitutions with all three fluoroprolines at position Pro38, in the core of the poly-P stretch, only yielded the trans signals, indicating that the cis content was very low (Fig. 3D-F). This observation is in agreement with previous observations that the cis conformer of proline within the poly-P stretch was undetectably low[62]. For 4,4-diFPro, the chemical shift difference between both 19F doublets decreased when compared to the substitution at the Pro34 position, which suggested a more balanced (endo/exo) conformational equilibrium[52]. Remarkably, for 4,4-diFPro and independently of the position in the poly-P tract, the resonance intensities and line widths of both doublets were different, indicating distinct 19F relaxation properties for both fluorine atoms. This is more noticeable at position Pro34. Note that fluorine relaxation is largely governed by its chemical shift anisotropy (CSA)[11,12]. Density Functional Theory (DFT) calculations for 4R-FPro and 4S-FPro have previously shown that the 19F CSA is significantly different for the Cγ-exo and Cγ-endo conformers, with the highest CSA values found when the fluorine is in the pseudo-axial position (Fig. 1B)[49]. The distinct relaxation properties of both fluorines in 4,4-diFPro could thus be explained by a difference in the Cγ-exo and Cγ-endo populations, resulting in distinct average CSAs. The observation that the line widths are more different at position Pro34 than at Pro38 is thus consistent with the observed higher difference in chemical shift, implying a more biased ring conformation in this position due to the sequence context.

Fluoroprolines induce spectroscopic changes to the neighbouring glutamine

The efficient incorporation of fluoroprolines into H16 enabled the evaluation of the spectroscopic changes occurring in the neighbouring poly-Q tract (Table 1). For this, we separately incorporated the three fluoroprolines in position Pro34 of H16 in a [15N,13C]-Gln-containing CF reaction, and recorded the corresponding 15N-HSQC experiments (Fig. 4). Equivalently to the native protein, the three spectra displayed a large, unresolved density corresponding to the glutamines present in H16[65,69], with some isolated signals corresponding to glutamines with specific chemical environments. Interestingly, the most downfield NH-HN correlation, which corresponds to Gln33 (see below), exhibits different chemical shifts depending on the specific fluoroproline introduced (Table 1). Concretely, the largest Gln33 NH chemical shift change was observed for 4,4-diFPro (-0.72 ppm), followed by 4R-FPro (-0.33 ppm), while 4S-FPro displayed the smallest one (-0.19 ppm). Although the 1H chemical shift turned out to be less affected than that of 15N, changes induced by 4,4-diFPro (0.074 ppm) were notably larger than those observed for 4R-FPro and 4S-FPro, 0.044 ppm and 0.048 ppm, respectively. These observations suggest that, not surprisingly, a larger effect is observed when two fluorine atoms are introduced in close proximity to the probed residue.

Table 1.

sample characteristics and structural properties of the H16 constructs with the three fluoroprolines.

Sample preparation trans:cis b trans cis
Fluoroproline CF conc.
(μM)a
NMR sample
conc. (μM)
Pro34 Pro38 Δ15N
(ppm)
Δ1H
(ppm)
Δ15N
(ppm)
Δ1H
(ppm)
4S-FPro 2.6 21.0 82.5:17.5 100:0 −0.19 0.048 −0.42 0.039
4R-FPro 1.6 8.7 90.9:9.1 100:0 −0.33 0.044 NDc NDc
4,4-diFPro 1.9 5.2 ND 100:0 −0.72 0.074 ND ND

ND: Non-determined frequencies due to overlap or low intensity.

a-

H16 concentrations at the end of the CF reactions (10 mL).

b-

Data obtained from 19F-NMR experiments.

c-

Ill-defined peaks in the 15N-HSQC.

Figure 4. Probing the incorporation of fluoroprolines in H16.

Figure 4.

15N-HSQC spectra of 15N-glutamine labelled H16-P34 after the incorporation of (A) 4S-FPro (blue), (B) 4R-FPro (red) and (C) 4,4-diFPro (purple). Spectra are overlaid with the fully labelled H16 (grey) and the SSIL sample of Gln33[65] (green). Chemical shift perturbations on trans and cis peaks of Gln33 caused by the presence of fluoroprolines are indicated with arrows.

The above-described observations corresponded to the major (trans) conformation of the Gln33-Pro34 bond monitored on Gln33. Note that cis populations of 9.1 ± 3.7% and 17.5% ± 1.3% were detected using 19F-NMR for this bond upon introducing 4R-FPro and 4S-FPro, respectively (Fig. 3). We also monitored the chemical shift perturbation of this minor conformation. By inspecting the unresolved glutamine signal density of H16-P34 containing 4S-FPro, we identified a small peak that was not present in the spectrum of the native protein (Fig. 4A). The identity of this signal as the cis form of the 4S-FPro manifested in Gln33 was subsequently confirmed (see below). Although less clear, a similar additional density in a similar position was observed for the 4R-FPro variant of H16-P34, which was also assigned to the cis form of this ncAA (Fig. 4B). Unfortunately, no signal was observed for the 4,4-diFPro variant, which was probably overlapped in the unresolved glutamine density of the 15N-HSQC spectrum. The chemical shift perturbation on Gln33 NH induced by 4S-FPro was -0.42 ppm. Interestingly, this perturbation was notably larger than that observed for the trans isomer upon the incorporation of the same ncAA (-0.19 ppm). Although the perturbation induced by 4R-FPro on the cis conformation could not be accurately quantified due to its low intensity, our data suggest that the perturbation exerted is different than for 4S-FPro. While 4R-FPro has a stronger effect than 4S-FPro on the trans isomer, the cis isomer is more perturbed by 4S-FPro.

The ensemble of our observations shows that the number of fluorine atoms, the stereochemistry in Cγ and the isomeric form of the fluoroproline play relevant roles in the extent of the spectroscopic perturbation exerted on neighbouring amino acids.

Two-site orthogonal nonsense suppression in cell-free

The severe frequency overlap of the fluorinated H16-P34 samples precluded the NMR analysis of the effect exerted by fluoroprolines on glutamines of the poly-Q tract other than Gln33. In order to overcome this limitation, we endeavoured to perform a two-site orthogonal nonsense suppression reaction in CF. Briefly, we aimed to simultaneously introduce a fluoroproline in position Pro34 and an isotopically labelled glutamine at specific positions of the poly-Q tract. For this, we modified the previously developed tRNACUA/aaRS pairs for the external tRNACUA aminoacylation of proline[62] and glutamine[63]. In the absence of an engineered strain enabling the expansion of the genetic code[70,71], we simultaneously reassigned the amber (TAG) and opal (TGA) stop codons and the corresponding tRNA anticodons for the double orthogonal nonsense suppression. First, we tested the performance of the TGA nonsense codon with respect to the TAG one. For this, we added increasing amounts of proline-loaded tRNACUA and tRNAUCA to a CF mixture containing the H16 plasmid with the amber and the opal stop codons in the Pro34 position, respectively (Fig. 5A). The end-point fluorescence measurement of the reaction showed an important decrease in the yield when the TGA nonsense was used. This decrease varied depending on the concentration of tRNAUCA used, but it was more than 60% in all cases. This observation was expected as our lysate, which is depleted of the release factor (RF) 1,[72] still contains RF2 that competes for the opal nonsense codon, reducing the overall yield of the reaction.

Figure 5. Validation and application of the two-site orthogonal nonsense suppression.

Figure 5.

(A) Fluorescence intensities measured upon the addition of increasing amounts of proline-loaded tRNACUA (left) and tRNAUCA (right) to supress the TAG and TGA stop codons, respectively, introduced in position Pro34 of H16. Yields are compared to the production of H16 without stop codon (positive control). (B) Combined titration of loaded tRNACUA and tRNAUCA for the orthogonal suppression of Gln29/Pro34-H16 using either TAG/TGA (orange and red) or TGA/TAG (cyan and blue) stop codons. (C) 15N-HSQC of an orthogonally suppressed H16 sample incorporating [15N,13C]-Gln in position Gln33 and 4S-FPro in position Pro34 (blue) overlaid with a SSIL H16-Q33 (green) and fully labelled H16 (grey) samples. (D) Zoom of the Gln29 Cα-Hα correlation of the 13C-HSQCs of orthogonally suppressed H16 samples containing [15N,13C]-Gln in position Gln29 and 4S-FPro (blue) or 4R-FPro (red) in position Pro34. These two spectra are overlaid with the 13C-HSQC of the SSIL H16-Q29 sample.

Second, we analysed whether the order of the nonsense codons in the plasmid affected the protein synthesis. In a first set of experiments, we positioned the TAG and the TGA codons in the position of Gln29 and Pro34, respectively. Increasing concentrations of glutamine-loaded tRNACUA and proline-loaded tRNAUCA were added to the CF reaction and the fluorescence end-point was monitored (Fig. 5B). Not surprisingly, the higher the concentration of both tRNAs, the higher the yield. No saturation of the protein production was observed when large tRNA amounts (up to 50 μM) were added to the CF mixture. This behaviour was similar to the single suppression experiments for TGA, but not for TAG (Fig. 5A). This observation suggested that suppressing the TGA codon was the bottleneck of the synthesis. An equivalent titration experiment was performed using an H16 plasmid where the two nonsense codons were swapped (TGA and TAG in positions of Gln29 and Pro34, respectively). For this codon arrangement, the positive correlation between tRNA concentration and protein production was also observed. However, this second set of experiments systematically yielded larger protein amounts than the first one, suggesting that the arrangement of nonsense codons played a relevant role during protein translation.

Finally, we tested whether the relative position of the two nonsense codons influenced the protein production. For this, we positioned them consecutively in the positions of Gln33 and Pro34 of H16, and we repeated the previously described titration experiments using the two nonsense codon arrangements (Fig. 5B). Interestingly, we observed that the presence of two consecutive stop codons did not impact the translation and yields were similar to those obtained with the construct with more spaced nonsense codons. Again, we observed that utilizing TGA to suppress glutamine and TAG for proline resulted in a higher protein production than the inverse.

In summary, these experiments validated the orthogonal site-specific incorporation of amino acids in proteins and defined the optimal arrangement of nonsense codons. Importantly, although the protein production yields were notably reduced when using this strategy, our results indicated that sufficient H16 could be produced for subsequent structural investigations.

Overcoming spectral crowding with two-site orthogonal nonsense suppression

In order to validate our double orthogonal nonsense suppression for structural applications, we produced an H16 sample in which the codons corresponding to Gln33 and Pro34 were changed to the TGA and TAG nonsense codons, respectively. We produced the sample from 10 mL of CF reaction, containing 20 μM of tRNAUCA loaded with [15N,13C]-Gln and 10 μM of tRNACUA loaded with 4S-FPro. The purified product provided a 7 μM NMR sample that displayed two correlations with distinct intensity in the glutamine region of the 15N-HSQC spectrum (Fig. 5C). When overlaying this spectrum with the one previously obtained for H16-P34 suppressed with 4S-FPro (Fig. 4A), we observed a perfect overlap with the previously assigned trans and cis peaks probed by Gln33, validating our approach. Equivalent samples were produced by incorporating 4R-FPro and 4,4-diFPro, also showing an excellent overlap with the previously identified major trans peaks of Gln33 (see Fig. S1 in SI). However, no signal corresponding to the cis conformation could be unambiguously identified in these spectra, most probably due to the low population of this conformer for the 4R-FPro and 4,4-diFPro variants (see above) and the low concentration of the samples (~2 μM).

Then, we used the double orthogonal nonsense suppression to investigate the extent of the effects exerted by fluoroprolines. For this, we explored Gln29, a glutamine exhibiting a random coil conformational behaviour that follows the α-helical section of the poly-Q tract[65]. We produced H16 variants with site-specific isotopic labelling for Gln29 and containing either 4S-FPro or 4R-FPro in position Pro34, as these two fluoroprolines displayed the most distinct trans/cis according to previous investigations in peptide models[42,49] and our 19F-NMR experiments on H16-P34 (see above). We monitored the Cα-Hα correlation in Gln29 in order to probe putative fluoroproline-induced secondary structure perturbations (Fig. 5D). Interestingly, the Gln29 signals from both the H16 variants overlapped, indicating the absence of a differential effect caused by the stereochemistry of the fluorine atom in Pro34. Importantly, these peaks presented chemical shifts very similar to the Cα-Hα correlation of the native H16, suggesting that the influence of fluoroprolines positioned five residues apart was negligible.

Molecular dynamics simulations suggest that the fluorine effect is distance dependent

In order to rationalize the chemical shift perturbations observed in flanking glutamines when incorporating fluoroprolines in position Pro34 of H16, we performed molecular dynamics (MD) simulations of a HttExon-1 fragment containing the N17, the poly-Q tract with 16 glutamines and five prolines. Twenty simulations of 1 μs each of this fragment with the Gln33-Pro34 peptide bond in trans and cis conformations were performed. The independent trajectories, with an aggregated time of ~20 μs each, were generated at 293.15 K using the AMBER03ws force field[73] and frames at 200 ps intervals were saved for subsequent analyses.

First, we validated the trajectories by evaluating their capacity to reproduce experimental data measured on H16. The α-helical fractions computed from both trajectories were very similar, indicating that the H16 fragment presented an overall disorder while capturing transient populations of α-helices encompassing the N17 and different portions of the poly-Q tract (Fig. S2 in SI). Importantly, the α-helical fraction profiles displayed an excellent agreement with those previously measured for H16[65], indicating that these trajectories are excellent representations of the conformational behaviour of the protein in solution.

We aimed at evaluating whether the chemical shift perturbations that were experimentally probed can be explained by the distance to the fluorine atoms. From both trajectories, we calculated the distance distribution between the two Pro34 Hγs (proS and proR) and the backbone NHs for Gln33 and Gln29 (Fig. 1B), whose chemical shifts were monitored by NMR (see above). For Gln33, the trans trajectories showed that both Hγs presented narrow distance distributions centred around 6.5 Å, but the proR one displayed a shoulder corresponding to distances below 6.0 Å (Fig. 6A). The proR Hγ thus on average comes closer to Gln33 NH than the proS one. Very different distributions were observed when analysing the same distances for the trajectory with the Pro34 in a cis conformation. In this case, the proS Hγ sampled a very broad range of distances to Gln33 NH, including a large population of shorter distances below 6.0 Å (Fig. 6A). Interestingly, the proR Hγ sampled a distribution that appeared shifted to higher distances when compared to the trans form. Therefore, for the cis form, on average, it is proS Hγ that comes in close proximity to Gln33 NH. When doing the same analysis for Gln29 NH, much broader distributions of distances, sampling from 5 to 20 Å, and very similar for both Hγs and for both Pro34 isomers were obtained (Fig. 6B). Overall, our observations suggest that the two Hγs of a proline do not sample the same space, presenting different distances to flanking residues, especially for the cis conformation, although this effect is rapidly lost when moving further away in the sequence.

Figure 6. Fluorine distances to Gln33 and Gln29 computed from MD simulations.

Figure 6.

Computed distance distributions between the proS (solid line) and proR (dashed line) Hγ for Gln33 (A), and Gln29 NH (B) for the MD trajectories in trans (black) and cis (red). (C,D) Distance distribution of the proS Hγ in endo (black) or the proR Hγ in exo to the Gln33 NH corresponding to the trans (C) or the cis trajectory (D). (E,F) Distance distribution of the proS Hγ in endo (black) or the proR Hγ in exo to the Gln29 NH corresponding to the trans (E) or the cis trajectory (F).

Rationalizing the experimental chemical shift perturbations observed for the H16 constructs incorporating 4R-FPro and 4S-FPro, required adapting the trajectories to the specific conformational bias observed for both fluoroprolines. Indeed, it is well established that the presence of a fluorine atom in Cγ as well as its stereochemistry have a strong influence on the exo/endo conformational equilibrium in prolines[40,42,43,46,49,51]. Concretely, the 4R-FPro has a strong preference for the Cγ-exo conformation for both the cis and trans isomers (83% and 93%, respectively, within Ac-FPro-OMe model compounds), while the 4S-FPro shows an opposite behaviour with the Cγ-endo conformation representing more than 99% for both isomers[51]. Therefore, in order to evaluate the effect of the distance of the fluorine atom to the experimentally probed residues in H16, this conformational bias must be taken into account. For this, we first computed the distances within reduced conformational subsets of the HttExon-1 trajectories that selected for the Pro34 Cγ-exo and Cγ-endo conformations. For this, we calculated the Altona-Sundaralingam pseudorotation angle P of the five-membered ring to be used as a selection criterium (as detailed in the SI)[74,75]. We found that P = 18°± 9° captured the Cγ-exo and P = 180°±9° the Cγ-endo conformers67,68, which are in agreement with the values expected for proline (Fig. S3 in SI). Within these conformationally-restricted subensembles, the distance distributions from proR and proS Hγs to Gln33 NH were found to be highly influenced by the pucker of Pro34 (Fig. S4 in SI).

Figure 6C-F displays the distance distributions involving the particular Hγ proton that is swapped for fluorine in fluoroproline, as derived from the conformationally biased subensemble that is relevant for that specific fluoroproline. Concretely, these are the distances from the proS Hγ involving Cγ-endo and the proR Hγ involving Cγ-exo, the major conformational states of 4S-FPro and 4R-FPro, respectively. When focusing on the distances involving Gln33 NH computed from the trans trajectory (Fig. 6C), a clear difference between the proR-exo and proS-endo Hγs was observed, with the proR-exo Hγ displaying shorter distances than the proS-endo one. Distances from proR-exo could be as short as 4.5 Å, although the maximum of the distribution was 5.7 Å, while distances from the proS-endo Hγ displayed a very narrow peak centred in 6.4 Å. Note that these observations are in excellent agreement with our experimental observations, where the 4R-FPro exerted a stronger perturbation on Gln33 NH chemical shift (-0.33 ppm) than the 4S-FPro (-0.19 ppm). Interestingly, these distance distributions were inverted when the cis trajectory was analysed, with the proS-endo Hγ displaying shorter distances to the Gln33 NH than the proR-exo Hγ (Fig. 6D). Conversely, in the cis trajectory, the proS-endo Hγ distance distribution sampled significantly shorter distances than in the trans one. This observation was in agreement with the stronger chemical shift perturbation observed for the cis isomer (-0.42 ppm) than for the trans one (-0.19 ppm) when the 4S-FPro was introduced in Pro34 (Table 1).

When performing an equivalent analysis for Gln29, much broader ranges of distances were obtained, with the vast majority being above 10 Å (Fig. 6E,F). Importantly, no systematic distance shifts were observed for proR-exo/proS-endo Hγs and trans/cis isomers. This is in agreement with the experimental observation that Gln29 chemical shifts were not perturbed in the presence of 4R-FPro or 4S-FPro (Fig. 5D), and suggests that atoms located at distances larger than 10 Å from fluorine are not spectroscopically perturbed.

Finally, for 4,4-diFPro, both the proR and proS positions are fluorinated and a similar conformation is expected relative to proline, meaning both Cγ-exo and Cγ-endo conformers have to be considered. Since for exo, the proR position is on average found to be the closest to Gln33 NH, while for endo this is the proS (Fig. S4), both fluorines are expected to significantly contribute to the chemical shift perturbation. This in agreement with the observation that 4,4-diFPro in position Pro34 displays the largest chemical shift perturbation of the three fluoroprolines (Table 1).

Discussion

The incorporation of non-natural moieties, such as fluorine, into proteins enables to probe biomolecular structure and dynamics in a very elegant and clean way[16,19]. In our study, we present tools for the incorporation of fluoroprolines in a site-specific manner by combining the tRNA suppressor strategy with CF protein synthesis. This technology expands to proteins what was previously done using solid-phase peptide synthesis[27,49]. Although we have demonstrated our approaches for three commercial ncAAs, their application to less common Cβ-modified or difluorinated prolines as well as post-translationally modified 4-hydroxyproline seems feasible[39,51,76,77]. Crucially, this can be easily combined with additional isotopic labelling, either uniformly or for specific types of amino acids. This is an important advantage, as it paves the way for advanced NMR studies to assess the impact of the ncAA on the rest of the protein. In addition, by simultaneously using two stop codons in the construct, we have shown that the two-site orthogonal nonsense suppression is also achievable. Although the yields of such samples were substantially lower than for the single-site nonsense suppression, we have been able to produce enough labelled protein to explore the effects of the incorporation of the ncAAs to the inner part of the poly-Q tract of HttExon-1, which in the case of this homopolymer would have been impossible otherwise.

Site-specific introduction of fluorinated amino acids provides a unique perspective to study biomolecular dynamics using 19F NMR. The high sensitivity of the 19F chemical shift to its chemical environment particularly benefits relaxation dispersion or chemical exchange saturation transfer experiments to detect conformational motions on the millisecond time-scale[11,12]. For fluorinated prolines, it has been recently demonstrated that 19F relaxation measurements can be exploited to assess residue-specific dynamics on nanosecond and millisecond time-scales within polyproline sequences[49]. This is especially useful given proline’s lack of a backbone amide proton, the usual nucleus exploited in biomolecular dynamics studies by NMR.

Assessing the nature of the perturbation exerted by fluoroprolines to the neighbouring residues is an important aspect in order to fully exploit ncAAs for structural purposes. Indeed, if protein structures are modified by the presence of non-natural moieties, the structural and dynamic information extracted for the protein from these probes can be biased. We have used a non-pathogenic HttExon-1 construct, which has been extensively characterized in the past[62,65], as a model system to record and structurally interpret the structural preferences of the three fluoroprolines and the chemical shift changes exerted on neighbouring glutamines. In line with previous measurements on model compounds and quantum chemistry calculations, we observed a stronger preference of the cis conformation for the 4S-FPro than for the 4R-FPro and 4,4-diFPro[51]. Furthermore, our NMR data on isotopically enriched HttExon-1 forms unambiguously show that the number of fluorine atoms and the stereochemistry of the Cγ of prolines influence the chemical shift of the flanking glutamine. This observation holds for both trans and cis isomers, although the perturbation goes in opposite directions. The effect of fluorine vanishes for residues positioned five residues apart from the fluoroprolines, indicating that the extent of the perturbation is very local. We have performed extensive MD simulations using a state-of-the-art force field for disordered proteins to rationalize these observations and to evaluate the origin of these changes. In all cases, experimentally observed strong chemical shift perturbations were associated with shorter distances between the fluorine and the atom probed by NMR. In very good agreement with our NMR analyses, our simulations show that distance distributions are strongly influenced by the isomeric form (trans/cis) and the pucker (exo/endo) of the proline. The comparison between the experimental data and the simulations strongly suggests that the fluoroprolines used in this study do not affect the structure of HttExon-1 and that the chemical shift changes observed are mainly due to the perturbation of the electronic environment induced by fluorine. Therefore, in the context of a disordered protein, the presence of fluorine in prolines does not strongly perturb the secondary structure of the region and chemical shift changes mainly originate from the proximity to fluorine. This observation underlines the use of fluorinated amino acids to monitor the intramolecular contacts in proteins or biomolecular interactions[25,78]. A specially interesting application of the site-specific incorporation of fluoroprolines will be the study of the interactome of proline-rich proteins, for which high-resolution studies are extremely challenging. Note that poly-P tracts participate in protein-protein interaction networks, often through specific interactions with domains such as SRC homology 3 (SH3), the WW[79,80] and the small actin-binding proteins profilins[81,82].

In summary, this study presents a chemical biology tool for the site-specific incorporation of commercially available fluoroprolines into proteins. The capacity to place one (or multiple) fluorine atoms in specific protein positions, in combination with tailored isotopic labelling schemes and without strongly modifying the structure, enables the measurement of precise NMR data reporting on intra- and intermolecular interactions.

Methods

Protein constructs

Plasmids were prepared as previously described[63]. All synthetic genes of huntingtin exon1 (H16) were ordered from GeneArt (Thermo Fisher Scientific, Illkirch, France) and cloned into pIVEX 2.3d 3C-sfGFP-His6. This includes the wild-type version and the genes encoding suppression stop codons. Amber stop codons (TAG) were used for single suppression samples at positions Gln33, Pro34 and Pro38. For double suppression samples, we tested all combinations of amber (TAG) and opal (TGA) codons in positions Gln29/Gln33 and Pro34 to evaluate the overall yield and the best codon arrangement.

Preparation of aminoacylated suppressor tRNACUA

Two pairs of tRNACUA/tRNA synthetase were prepared following previously described protocols to load amber stop codons with glutamine[63] and proline[62]. The latter was also used to load 4S-FPro, 4R-FPro and 4,4-diFPro. To use the opal stop codon, the glutamine and proline tRNACUA were modified to tRNAUCA in order to recognise the TGA codon. Then, the same aminoacylation conditions as for tRNACUA were used. The successful loading of tRNA was confirmed by urea-PAGE (6.5% acrylamide 19:1, 8 M urea, 100 mM sodium acetate pH 5.2).

Standard cell-free expression

A lysate based on Escherichia coli strain BL21 Star (DE3)::RF1-CBD3 was used for protein expression by CF reactions as previously described[63,83] The optimal tRNA concentrations for each fluorinated proline were determined by titrations of tRNA in 50 μL CF reactions incubated at 23 °C for 2-3 h. Then, the protein expression was monitored in a plate reader (Gen5 v3.03.14, BioTek Instruments, Colmar, France) by measuring sfGFP fluorescence (excitation at 485 nm, emission at 528 nm).

Preparation of NMR samples

Samples for NMR measurements were produced by 10 mL-CF reactions that were incubated at 23 °C and 450 rpm in a thermomixer for 3 h. Standard conditions were used as previously described[65]. However, different labelling patterns required modifications in the amino acid mixtures used in each reaction. To uniformly label H16 samples, the standard amino acid mix was substituted by [15N,13C]-labeled ISOGRO supplemented with [15N, 13C]-labeled Asn, Cys, and Trp (1 mM each) and 2 mM [15N, 13C]-Gln. For Gln33 single suppression and Gln33-Pro34 and Gln29-Pro34 double suppression samples, a non-labeled amino acid mixture was used. For single Pro34 and Pro38 single suppression samples used for 19F-NMR, a non-labelled amino acid mixture lacking glutamines was used and the CF reaction was supplemented with 2 mM [15N, 13C]-labeled Gln in order to monitor the stability of the sample via 15N-HSQC spectra. In all suppression samples the corresponding suppressor tRNA was added at 10 μM or 20 μM for single or double suppression samples, respectively.

Prior to purification, samples were centrifuged and diluted 5-fold in 50 mM Tris-HCl pH 7.5, 500 mM NaCl, 5 mM imidazole. Proteins were then purified by affinity chromatography using a Ni gravity-flow column. After elution with 250 mM imidazole, samples were dialyzed overnight against NMR buffer (20 mM BisTris-HCl pH 6.5, 150 mM NaCl) and concentrated using 10 kDa MWCO Vivaspin centrifugal concentrators. Final protein concentrations were determined by fluorescence measurements in combination with a sfGFP calibration curve. Final NMR samples were supplemented with 10% D2O and 0.5 mM 4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS).

NMR measurements

15N and 13C NMR experiments were performed at 293 K on a Bruker Avance III spectrometer equipped with a cryogenic triple resonance probe, operating at a 1H frequency of 800 MHz. 15N-HSQC and 13C-HSQC were acquired for each sample in order to determine amide (1HN and 15N) and aliphatic (1Haliphatic and 13Caliphatic) chemical shifts, respectively. 15N-HSQC experiments used spectral widths of 16 ppm and 23 ppm in the 1H and 15N dimensions, respectively. The total number of time domains points were 2048 and 112 in the 1H and 15N dimensions, respectively. 13C-HSQC experiments used spectral widths of 12.5 ppm and 58 ppm in the 1H and 13C dimensions, respectively. The total number of time domains points were 2048 and 112 in the 1H and 13C dimensions, respectively. Typically, 632-688 transients were used for the 15N-HSQC experiments of single suppression samples and 1264-1280 for double labeled proteins. Between 288 and 456 transients were used for the 13C-HSQC experiments. Prior to Fourier transform, HSQC spectra were multiplied in both dimensions with a squared cosine bell function and zero filled until a real data matrix size was obtained. All spectra were processed with TopSpin v3.5 (Bruker Biospin) and analyzed using CCPN-Analysis software[84].

19F-NMR experiments

All NMR samples were first concentrated up to a ca. 200 μL volume using Vivaspin centrifugal concentrators (Sartorius) with a 5 kDa MWCO at 4°C. 0.1 μL of a trimethylsilylpropanoic acid (TMSP) solution for chemical shift referencing and 10 μL of D2O were added before the NMR measurement. All 19F NMR experiments were performed on a Bruker Avance III HD spectrometer operating at a 1H and 19F frequencies of 600.13 MHz and 564.69 MHz, respectively, equipped with a CP-QCI-F cryoprobe with 19F cryo-detection. All 19F 1D experiments were performed at 293.0 K with 1H decoupling during acquisition using waltz16 composite pulse decoupling. For all Pro34-modified samples, a 19F spectral window of 20.1 ppm was used, with an acquisition time of 0.50 s, a relaxation delay of 1.0 s and 65536 transients in total. For the Pro38-modified samples, a 19F spectral window of 100.6 ppm was used, with an acquisition time of 0.29 s and a relaxation delay of 0.5 s. For the Pro38-modified samples, concatenated 1D 19F spectra of 128 transients each were acquired in order to monitor any spectral changes over time, which were not observed. These transients were then averaged to deliver the final spectrum, resulting in 262144 total transients for the 4S-FPro and 4,4-diFPro samples and 393216 for the 4R-FPro sample. Prior to Fourier transform, all 19F spectra were multiplied with an exponential weighting function with a 5 or 10 Hz line broadening factor and zero-filled until a total number of real spectral data points between 32768 and 131072 was obtained. All 19F spectra were referenced to the 1H signal of TMSP using the unified chemical shift scale.

MD simulations and analyses.

Twenty structures of H16 (N17-polyQ16-5P) were extracted from a ~20 μs long MD trajectory which was initiated from a coil conformation generated based on the Flexible-Meccano algorithm[85] using the ProtSA webserver[86]. To generate the corresponding cis conformers of Pro34, the Ω dihedral angle for the peptide bond connecting Gln33-Pro34 was set to 0° for each of the twenty structures in USCF Chimera[87]. Both trans and cis structures were solvated in octahedral boxes (edge length = 8.0 nm) with 150 mM NaCl, and additional counter ions added to neutralize the system. The system topology was modelled using the AMBER03ws force field[73] (https://bitbucket.org/jeetain/all-atom_ff_refinements/src/master/) and the TIP4P/2005 water model[88]. Modified Lennard-Jones parameters proposed by Luo and Roux[89] were used for Na+ and Cl ions to improve ion solubility.

The solvated systems were first subjected to energy minimization using the steepest descent algorithm in GROMACS-2020.4.[90] Following minimization, the systems were simulated for 100 ps using Nose-Hoover thermostat[91]c = 1 ps) to attain a target temperature of 293.15 K. Following temperature equilibration, a 100 ps simulation was conducted using the Berendsen barostat[92] with isotropic coupling (τp= 5 ps) for pressure control, to achieve a target pressure of 1 bar. Production simulations for each structure was performed for 1.0 μs in the NVT ensemble using the Langevin Middle Integrator[93] (friction coefficient = 1 ps−1) within OpenMM-7.5.[94] Short-range nonbonded interactions were treated with a cut-off radius of 0.9 nm, while long-range electrostatics were computed using the Particle Mesh Ewald (PME) method.[95,96] All bonds with hydrogen atoms were constrained using the SHAKE algorithm[97] and hydrogen masses were increased by 1.5 times, allowing a simulation timestep of 4 fs.[98] α-helical fractions were computed based on the definition provided in the DSSP library[99] using the gmx do_dssp program. Gln29/Gln33-Pro34 distances and the Pro34 Altona-Sundaralingam P pseudorotation angle distributions for both cis and trans isomer trajectories were computed with custom python scripts using the MDanalysis library.[100]

Supplementary Material

Supinfo

Acknowledgements

The authors thank Gottfried Otting (Australian National University, Canberra, Australia) for providing the BL21 (DE3) Star::RF1-CBD3 strain. This work was supported by the European Research Council under the European Union's H2020 Framework Programme (2014-2020) / ERC Grant agreement n° [648030]. The CBS is a member of France-BioImaging (FBI) and the French Infrastructure for Integrated Structural Biology (FRISBI), 2 national infrastructures supported by the French National Research Agency (ANR-10-INBS-04-01 and ANR-10-INBS-05, respectively). J.M received funding from the National Institutes of Health (Grant No: NIGMS R35GM153388). D.S. acknowledges an « accueil de talent » grant from the Métropole Européenne de Lille (PUSHUP). The 600 MHz spectrometer for 19F NMR measurements is funded by the Nord Region Council, CNRS, Institut Pasteur de Lille, the European Community (ERDF), the French Ministry of Research and the Université de Lille and by the CTRL CPER co-funded by the European Union with the European Regional Development Fund (ERDF), by the Hauts-de-France Regional Council (contract n°17003781), Métropole Européenne de Lille (contract n°2016_ESR_05), and French State (contract n°2017-R3-CTRL-Phase1). Atomistic MD simulations were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supinfo

RESOURCES