Skip to main content
Communications Biology logoLink to Communications Biology
. 2021 Mar 19;4:362. doi: 10.1038/s42003-021-01878-9

Exploration of natural red-shifted rhodopsins using a machine learning-based Bayesian experimental design

Keiichi Inoue 1,2,3,4,5,✉,#, Masayuki Karasuyama 5,6,#, Ryoko Nakamura 3, Masae Konno 3, Daichi Yamada 3, Kentaro Mannen 1, Takashi Nagata 1,5, Yu Inatsu 6, Hiromu Yawo 1, Kei Yura 7,8,9, Oded Béjà 10, Hideki Kandori 2,3,4, Ichiro Takeuchi 2,4,6,
PMCID: PMC7979833  PMID: 33742139

Abstract

Microbial rhodopsins are photoreceptive membrane proteins, which are used as molecular tools in optogenetics. Here, a machine learning (ML)-based experimental design method is introduced for screening rhodopsins that are likely to be red-shifted from representative rhodopsins in the same subfamily. Among 3,022 ion-pumping rhodopsins that were suggested by a protein BLAST search in several protein databases, the ML-based method selected 65 candidate rhodopsins. The wavelengths of 39 of them were able to be experimentally determined by expressing proteins with the Escherichia coli system, and 32 (82%, p = 7.025 × 10−5) actually showed red-shift gains. In addition, four showed red-shift gains >20 nm, and two were found to have desirable ion-transporting properties, indicating that they would be potentially useful in optogenetics. These findings suggest that data-driven ML-based approaches play effective roles in the experimental design of rhodopsin and other photobiological studies. (141/150 words).

Subject terms: Biophysics, Computational biology and bioinformatics, Biochemistry


Inoue, Takeuchi and colleagues propose a machine learning-based protocol to screen rhodopsins for their likelihood to be red-shifted. After experimental verification, their tool shows remarkable success at identifying rhodopsins that showed red-shift gains.

Introduction

Microbial rhodopsins are photoreceptive membrane proteins widely distributed in bacteria, archaea, unicellular eukaryotes, and giant viruses1,2. They consist of seven transmembrane (TM) α helices, with a retinal chromophore bound to a conserved lysine residue in the seventh helix (Fig. 1a). The first microbial rhodopsin, bacteriorhodopsin (BR), was discovered in the plasma membrane of the halophilic archaea Halobacterium salinarum (formerly called H. halobium)3. BR forms a purple-colored patch in the plasma membrane called purple membrane, which outwardly transports H+ using sunlight energy4. After the discovery of BR, various types of microbial rhodopsins were reported from diverse microorganisms, and recent progress in genome sequencing techniques has uncovered several thousand microbial rhodopsin genes1,57. These microbial rhodopsins show various types of biological functions upon light absorption, leading to all-trans-to-13-cis retinal isomerization. Among them, ion transporters, including light-driven ion pumps and light-gated ion channels, are the most ubiquitous (Fig. 1b). Ion-transporting rhodopsins can transport several types of cations and anions, including H+, Na+, K+, halides (Cl, Br, I), NO3, and SO42,810. The molecular mechanisms of ion-transporting rhodopsins have been detailed in numerous biophysical, structural, and theoretical studies1,2.

Fig. 1. Structure and phylogenetic tree of microbial rhodopsins.

Fig. 1

a Schematic structure of microbial rhodopsins. b Phylogenic tree of microbial rhodopsins. The subfamilies of light-driven ion-pump rhodopsins targeted in this study are differently colored; non-ion-pump microbial rhodopsins and ion-pumping microbial rhodopsins from eukaryotic and giant viral origins are shown in gray.

In recent years, many ion-transporting rhodopsins have been used as molecular tools in optogenetics to control the activity of animal neurons optically in vivo by heterologous expression11, and optogenetics has revealed various new insights regarding the neural network relevant to memory, movement, and emotional behavior1215. However, strong light scattering by biological tissues and the cellular toxicity of shorter wavelength light make precise optical control difficult. To circumvent this difficulty, new molecular optogenetics tools based on red-shifted rhodopsins, which can be controlled by weak scattering and low toxicity longer-wavelength light are urgently needed. Therefore, many approaches to obtain red-shifted rhodopsins have been reported, including gene screening, amino acid mutation based on biophysical and structural insights, and the introduction of retinal analogs1618. The insights obtained in these experimental studies, and further theoretical and computational studies1922 revealed basic physical principle regulating absorption maximum wavelengths (λmax) of rhodopsins (also called spectral or color-tuning rule) in which the distortion of retinal polyene chain induced by steric interactions with surrounding residues, electrostatic interaction between protonated retinal Schiff base and counterion(s), and polarizability of the retinal binding pocket play essential role23. The λmax of several rhodopsins could be red-shifted by 20–40 nm without impairing the ion-transport function based on these physicochemical insights17,24,25. These are successful examples of knowledge-driven experimental approach. Recently, a new method using a chimeric rhodopsin vector and functional assay was reported to screen the λmax and proton transport activities of several microbial rhodopsins that are present in specific environments26. This method identified partial sequences of red-shifted yellow (560–570 nm)-absorbing proteorhodopsin (PR), the most abundant outward H+-pumping bacterial rhodopsin subfamily, from the marine environment. These works identified several red-shifted rhodopsins15,16,18,27. Especially, most successful optogenetic tools are red-shifted channel rhodopsins such as Chrimson27,28 and RubyACR29 which can induce and inhibit neural firing by absorbing 590 and 610-nm light, respectively. The rational amino acid mutation based on the structural insight further red-shifted the λmax of Chrimson to 608 nm27. The development of next-generation sequencing technology is expected to continue to more rapidly identify a large number of new rhodopsin genes, including proteins with even longer wavelength-shifted absorption. However, screening of all of them either by experimental or theoretical methods would be very costly. Therefore, a less expensive and more efficient approach to screen red-shifted rhodopsins is needed, and data-driven study is expected as the third class of approach to investigate the color-tuning rule of rhodopsins at low cost.

To estimate the λmax of rhodopsins, we recently introduced a data-driven approach30. In this previous study, we investigated the statistical relationship between the amino acid types at each position of the seven TM helices and the absorption wavelength of rhodopsins. We constructed a database containing 796 wild-type (WT) rhodopsins and their variants, the λmax of which had been reported in earlier studies. Then, we evaluated the strength of the relationship with a data-splitting approach, i.e., the data set was divided into a training set and a test set; the former was used to construct the predictive model, and the latter was used to estimate the predictive ability. The results of this “proof-of-concept’’ study suggested that the λmax of an unknown family of rhodopsins could be predicted with an average error of ±7.8 nm, which is comparable to the mean absolute error of λmax estimated by the hybrid quantum mechanics/molecular mechanics (QM/MM)21 method. Considering the computational cost of both approaches, the data-driven approach was found to be much more efficient than the QM/MM approach, while the latter provides insights on the physical origin controlling λmax.

Encouraged by this result, in this study, we introduced a machine-learning (ML)-based experimental design method which enables us screening more efficiently the candidates of rhodopsins that are likely to have red-shift gains with data-driven assist compared to the random or knowledge-driven screening. For this aim, we constructed a new dataset of 3022 wild-type putative ion-pump rhodopsins which were collected from public gene databases (NCBI non-redundant protein sequences, and metagenomic proteins31 and the Tara Oceans microbiome and virome database32) and for which λmax have not been experimentally investigated yet to explore new red-shifted rhodopsins. The goal of the present study was to identify rhodopsins with λmax longer than the wavelengths of the representative rhodopsins in each subfamily of microbial rhodopsins for which the λmax has already been reported (base wavelengths). Here, we call the degrees of red-shift of the wavelength from the base wavelength the “red-shift gain”. We focus on rhodopsins with large red-shift gains because this would lead to the identification of amino acid types and residue positions that play important roles in red-shifting absorption wavelengths. Also, it is practically important in optogenetics applications to have a wide variety of ion-pumping rhodopsins from each subfamily to construct a new basis for rhodopsin toolboxes with red-shifted absorption and various types of ion species that can be transported. We constructed the ML-based experimental design method so that it could properly predict the expected red-shift gains, and applied this new method to 3022 putative ion-pumping rhodopsins derived from archaeal and bacterial origins that can be easily expressed in Escherichia coli (Fig. 1b).

We conducted experiments by introducing the synthesized rhodopsin genes into E. coli to measure the absorption wavelengths of 65 candidates for which the ML-based experimental design method predicted that the expected gains were >10 nm. Of these 65 selected candidates, 39 showed substantial coloring in E. coli cells, 32 showed actual red-shift gains, 6 showed blue-shifts, and 1 showed no change, i.e., 82% (=32/39, 7.025 × 10−5) of the selected candidates showed actual red-shift gains. We then investigated the ion-transportation properties of the rhodopsins for which the red-shift gains were >20 nm, and found that some actually had desirable ion-transporting properties, suggesting that they (and their variants) could potentially be used as new optogenetics tools. Furthermore, the differences in the amino acid sequences of the newly examined rhodopsins and the representative ones in the same subfamily could be used for further investigation of the red-shifting mechanisms. This result suggests that it should be possible to find rhodopsins that have desired properties without conducting exhaustive biological experiments, and suggests that data-driven ML-based approaches should play effective roles in the experimental design of rhodopsin and other photobiological studies.

Results

Construction of an ML-based experimental design method for predicting expected red-shift gain

To screen rhodopsins that would have large red-shift gains, it is necessary to consider the uncertainty of prediction in the form of “predictive distributions”33. By using predictive distributions, it is possible to consider appropriately the “exploration–exploitation trade-off” in screening processes34,35, where exploration indicates an approach that prefers candidates with larger predictive variances, and exploitation indicates an approach that prefers candidates with longer predictive mean wavelengths (Fig. 2). Here, the term “exploration–exploitation’’ is a technical term used in the fields of active learning and experimental design, and “explorations’’ in the title of this paper is used in a broader sense and is not directly related to the former technical terminology. We employed a Bayesian modeling framework to compute the predictive distributions of candidate rhodopsin red-shift gains. We then consider an exploration–exploitation trade-off by selecting candidate rhodopsins based on a criterion called “expected red-shift gains”.

Fig. 2. Illustrations of exploration–exploitation for screening rhodopsins with red-shift gain.

Fig. 2

a Bayesian prediction model constructed using the current training data (black crosses). The prediction model is represented by the predictive mean and predictive standard deviation (SD). The horizontal axis schematically illustrates the space of proteins defined through physicochemical features. The four vertical dotted lines indicate target proteins (candidates to synthesize). b Predictive mean. This function is defined as the expected value of the probabilistic prediction by the Bayesian model. c Predictive SD. Since the predictive SD represents the uncertainty of the prediction, it has a larger value when the training data points do not exist nearby. d The distributions on the vertical dotted lines represent the predictive distributions, and the horizontal dashed lines are the base wavelengths of the target points. The base wavelength is different for each target point because it depends on the subfamily of the protein. e The density of the predictive distribution of each target protein on its red-shift gain value. The gain is defined as the predicted wavelength subtracted by the base wavelength, and if it is negative, the value is truncated as 0. This can be seen as a “benefit” that can be obtained by observing the target protein. f Expected value of the red-shift gain. This provides a ranking list from which the next candidates to be experimentally investigated can be determined. Target #4 has the largest expected gain, although target #1 has the largest increase in the predictive mean compared with base wavelength in e. Because of its larger SD (as shown in a, c, d, and e), target #4 is probabilistically expected to have a larger gain than the other targets.

To compute the expected red-shift gains of a wide variety of rhodopsins, we developed ML-based experimental design method based on the statistical analysis in our previous study30. Figure 3 shows a schematic illustration of the ML-based experimental design method. First, we added 88 WT microbial rhodopsins and their variants for which the λmax had recently been reported in the literature or determined by our experiments, to a previously reported data set30. In other words, the new training data set consisted of the amino acid sequences and λmax of 884 WT microbial rhodopsins and their variants (Supplementary Data 1). Second, the new ML model used only N = 24 residues located around the retinal chromophore (Supplementary Fig. 1) because our previous study30 indicated that amino acid residues at these 24 positions play significant roles in predicting absorption wavelengths (Fig. 3a). Third, M = 18 amino acid physicochemical features (Supplementary Data 2) were used as inputs in the ML model, as opposed to the amino acid types used in the previous statistical analysis. This enabled us to predict the absorption wavelengths of a wide range of target rhodopsins that contain unexplored amino acid types in the training data at certain positions. Therefore, an amino acid sequence is transformed into an M × N = 432 dimensional feature vector xRMN by concatenating xi,j, the j-th feature of the i-th residue (Fig. 3b). We consider a linear prediction model fx=μ+i=1Nj=1Mβi,jxi,j, where βi,j is the parameter for the j-th feature of the i-th residue, and μ is the intercept term.

Fig. 3. Overview of the ML-based exploration of natural red-shifted rhodopsins.

Fig. 3

a Using existing experimental data, a training data set consisting of pairs of a wavelength λmax and an amino acid sequence was constructed. A particular focus was placed on the 24 amino acid residues around the retinal chromophore to build an ML-based prediction model. A set of protein sequences with no known wavelength was also collected as target proteins. b All amino acid sequences were transformed into physicochemical features, leading to 24 ×18 = 432 dimensional numerical representations of each protein. c A linear regression model was constructed using the Bayesian approach. Each regression coefficient βi,j was estimated as a distribution (shown as a gray region). The broadness of these distributions represent the uncertainty of the current estimation. d The expected red-shift gain values were evaluated for the target proteins. The green region is the standard deviation of the prediction. The red shaded region in the vertical distribution corresponds to the probability that the wavelength is larger than the base wavelength (dashed line), which is determined by the subfamily of the microbial rhodopsin. The bar represents the expected red-shift gain, defined by the expected value of the increase from the base wavelength.

Finally, to consider the exploration–exploitation trade-off appropriately in the screening process, we introduce a Bayesian modeling framework, which allows us to compute the predictive distributions of red-shift gains. Specifically, we employed Bayesian sparse modeling called BLASSO36 (see the Methods section for details). This enables us to provide not only the mean, but also the variance of the predicted wavelengths. Unlike classical regression analysis, BLASSO regards the model parameters βi,j and μ as random variables generated from underlying distributions, as illustrated in Fig. 3c. Therefore, the wavelength prediction f(x) is also represented as a distribution. The red-shift gain is defined as gain = max(f(x)−λbase’0), where λbase is the wavelength of the representative rhodopsin in the same subfamily whose λmax has been experimentally determined and reported in the literature (Supplementary Data 3). Note that the red-shift gain is positive if f (x) is greater than λbase; otherwise, it takes the value of zero. Since f (x) is regarded as a random variable in BLASSO, the red-shift gain is also regarded as a random variable. Therefore, we employ the expected value of the red-shift gain, denoted by E[gain], as the screening criterion where E represents the expectation of a random variable. Illustrative examples of E[gain] are shown in Fig. 3d. Unlike the simple expectation of the wavelength prediction E[f(x)], E[gain] depends on the variance of the predictive distribution (For example, E[gain] of target #4 is larger than #1 in Fig. 2f though Efxλbase of #4 is smaller than #1 in Fig. 2e). This encourages the exploration of rhodopsin candidates having large uncertainty (for exploration), as opposed to only those having longer wavelengths with high confidence (for exploitation).

Screening potential red-shifted microbial rhodopsins based on expected red-shift gains

The target data set to explore red-shifted microbial rhodopsins was constructed with putative microbial rhodopsin genes collected by a protein BLAST (blastp) search37 of the NCBI non-redundant protein and metagenome databases31, as well as the Tara Oceans microbiome and virome databases32. As a result, we obtained a non-redundant data set of 5558 microbial rhodopsin genes (Fig. 1b). The sequences were aligned by ClustalW and categorized to subfamilies of microbial rhodopsins based on the phylogenic distances, as reported previously38. Among these, 3022 rhodopsin genes, which did not have identical sequences in the training data and from bacterial and archaeal origins, were extracted because their λmax can be easily measured by expressing in E. coli cells. We calculated the E[gain] of these 3022 genes (Supplementary Data 4), and then selected 65 genes of putative light-driven ion-pump rhodopsins showing an E[gain] >10 nm for further experimental evaluation, as ion-pump rhodopsins can be used as new optogenetics tools.

Experimental measurement of the absorption wavelengths of microbial rhodopsins showing high red-shift gains

We synthesized the selected 65 genes that showed an E[gain] > 10 nm. These were then introduced into E. coli cells, and the proteins expressed in the presence of 10 μM all-trans retinal. As a result, 39 E. coli cells showed substantial coloring, indicating high expression of folded protein, and their λmax were determined by observing ultraviolet (UV)-visible absorption changes upon bleaching of the expressed rhodopsins through a hydrolysis reaction of their retinal with hydroxylamine, as previously reported30 (Fig. 4). The observed gains were compared with the E[gain] shown in Table 1. A full list of unexpressed genes is shown in Supplementary Data 5. In total, 32 out of 39 genes showed a longer wavelength than their base wavelength (that is, positive red-shift gain; Fig. 5), suggesting that our ML-based model can significantly improve the efficiency of screening to explore new red-shifted microbial rhodopsins compared with random sampling (p = 7.025 × 10−5 by a binomial test assuming that the probability of red-shift gain for random choice is 50%).

Fig. 4. λmax of 39 microbial rhodopsins in solubilized E. coli membrane observed upon hydroxylamine bleach reaction.

Fig. 4

The difference absorption spectra between before and after hydroxylamine bleaching reaction of microbial rhodopsins in solubilized E. coli membrane. The λmax of each rhodopsin was determined by the peak positions of the absorption spectra of the original proteins, and the absorption of retinal oxime produced by the reaction of retinal Schiff base and hydroxylamine was observed as a negative peak at ~360–370 nm.

Table 1.

Predicted and observed gains of 39 microbial rhodopsins expressed in E. coli.

Origin Accession Subfamily Motif Base wavelength/nm E [gain] Observed wavelength/nm (Observed wavelength)–(base wavelength)/nm
Rubricoccus marinus WP 094550238.1 BacHR TSA 537 40.7 541 4
Rubrivirga marina WP 095509924.1 BacHR TSA 537 39.8 548 11
Rubrivirga marina WP 095512583.1 BacHR TTD 537 35.5 577 40
Bacillus sp. CHD6a WP 082380780.1 XeR DTA 565 35.3 566 1
Bacillus horikoshii WP 063559373.1 XeR DTA 565 35.3 565 0
Cyanothece sp. PCC 7425 WP 012628826.1 BacHR TSV 537 32.9 566 29
Cyanobacterium TDX16 OWY65757.1 BacHR TSD 537 32.9 546 9
Myxosarcina sp. GI1 WP 052056058.1 BacHR TTV 537 31.2 557 20
Nanohaloarchaea archaeon SW 7 43 1 PSG98511.1 XeR DSA 565 29.2 572 7
Metagenome sequence SAMEA2621839 1737175 2 ClR NTQ 530 25.7 520 −10
Metagenome sequence SAMEA2620666 5055 4 ClR NTQ 530 25.1 525 −5
Nonlabens sp. YIK11 AIG86802.2 PR DTE 520 21.5 531 11
Metagenome sequence SAMEA2622673 750013 58 ClR NTQ 530 21.4 534 4
Metagenome sequence EBN24473.1 PR DTE 520 20.0 525 5
Metagenome sequence SAMEA2620404 88891 6 PR DTE 520 20.0 527 7
Parvularcula oceani WP_051881578.1 NaR NDQ 525 19.7 534 9
Rubrobacter aplysinae WP 084709429.1 DTG DTG 535 19.5 541 6
Metagenome sequence SAMEA2619531 1917517 3 PR DTE 520 18.0 537 17
Metagenome sequence SAMEA2622766 213679 12 XeR DSA 565 17.8 572 7
Reinekea forsetii WP 100255947.1 PR DTE 520 17.1 524 4
Bacteroidetes bacterium PSR14004.1 PR DTE 520 15.4 537 17
Metagenome sequence SAMEA2620980 19116 14 PR DTE 520 15.4 536 16
Hassallia byssoidea VB512170 KIF37192.1 BacHR TSD 537 15.1 535 −2
Erythrobacter gangjinensis WP 047006274.1 NaR NDQ 525 13.7 531 6
Pontimonas salivibrio WP 104913209.1 PR DTE 520 12.2 538 18
Cyanobacteria bacterium QH 1 48 107 PSO50292.1 CyanDTE DTD 545 12.0 548 3
Sphingopyxis baekryungensis WP 022671827.1 ClR NTQ 530 11.0 518 −12
Sphingobacteriales bacterium BACL12 MAG120802bin5 KRP08428.1 PR DTE 520 10.9 531 11
Metagenome sequence SAMEA2621401 1198262 5 PR DTE 520 10.9 534 14
Spirosoma oryzae WP 106137740.1 NaR NDQ 525 10.8 533 8
Aliterella atlantica WP 045053084.1 BacHR TSD 537 10.8 533 −4
Rosenbergiella nectarea WP 092678153.1 DTG DTG 535 10.8 533 −2
Metagenome sequence SAMEA2620980 1827033 1 PR DTE 520 10.4 537 17
Fluviicola sp. XM24bin1 PWL28924.1 PR DTE 520 10.4 538 18
Metagenome sequence SAMEA2622173 654706 7 PR DTE 520 10.4 530 10
Metagenome sequence SAMEA2619399 1397592 7 PR DTE 520 10.4 529 9
Sphingomonas sp. Leaf34 WP 055875688.1 DTG DTG 535 10.3 540 5
Sphingomonas sp. Leaf38 WP 056475157.1 DTG DTG 535 10.3 540 5
Metagenome sequence ECV93033.1 PR DTE 520 10.3 542 22

Fig. 5. Observed wavelengths and expected red-shift gains.

Fig. 5

The predicted and observed red-shift (and blue-shift) gains for the 39 candidate rhodopsins that showed substantial coloring in E. coli cells. Differences between observed and base wavelengths are shown by the bars. The red bars indicate red-shift from the base wavelength, while the blue bars indicate observed wavelengths that were shorter than the base wavelengths. Proteins are sorted in the descending order by E[gain], as shown by the black line. Among the 39 candidates, 32 (82%) showed red-shift gains, suggesting that the proposed ML-based model can screen red-shifted rhodopsins more efficiently than random choice.

Ion-transport function of red-shifted microbial rhodopsins

Overall, 4 of the 39 rhodopsins showed red-shifted absorption ≥20 nm compared with the base wavelengths (Table 1): three were halorhodopsins (HRs) from bacterial species10,39,40 (to distinguish classical HRs from archaeal species, these are hereafter referred to as bacterial-halorhodopsins [BacHRs]), and one was a PR41. Their ion-transport activities were then investigated by expressing in E. coli cells and observing the pH change in external solvent whose pH was initially set to 7 (Fig. 6a). Upon light illumination, BacHRs from Rubrivirga marina and Myxosarcina sp. GI1 showed alkalization of external solvent, which was enhanced by addition of the protonophore (CCCP), which increases the H+ permeability of the cell membrane, and the light-dependent alkalizations disappeared when anions were exchanged from Cl to NO3, indicating that these were light-driven Cl pumps, similar to other rhodopsins in the same BacHR subfamily10,39. By contrast, Cyanothece sp. PCC 7425 did not show any substantial transport. While no transporting function can be attributed to the heterologous expression in E. coli, it would have considerably different molecular properties from other BacHRs. PR from a metagenome sequence (ECV93033.1) showed acidification of external solvent that was abolished by the addition of CCCP and was independent from ionic species in the solvent. Hence, this was a new red-shifted outward H+ pump compared with typical PRs whose λmax are present at ca. 520 nm41. Furthermore, these rhodopsins are needed to be functional in mammalian cells for their optogenetic applications. To verify this issue, we carried out electrophysiological experiment to measure the photocurrent of BacHRs from Rubrivirga marina and PR from a metagenome sequence (ECV93033.1) in mammalian cells (ND7/23; Fig. 6b). Both of them showed substantial photocurrent even in the mammalian cells. These light-driven ion-pumping rhodopsins with red-shifted λmax have the potential to be applied as new optogenetics tools, and thus, warrant further study in the near future.

Fig. 6. Light-driven ion-transport activities of microbial rhodopsins showed longer λmax.

Fig. 6

a The light-induced pH change in the external solvent of E. coli cells expressing four microbial rhodopsins that showed a λmax  ≥ 20 nm longer than the base wavelength of the subfamily. The data obtained without and with 10 μM CCCP are indicated by the blue and green lines, respectively, in 100 mM NaCl, CsCl, and NaNO3. Light was illuminated for 150 s (yellow solid lines). b Rubrivirga marina BacHR or PR (ECV93033.1 metagenome) were expressed in the membrane of ND7/23 cells (top image) and generated positive photocurrent in response to a green light pulse (200 ms, 549 nm, 28 mW/mm2). The traces in the bottom are typical records at a holding potential of 0  mV.

Discussion

Microbial rhodopsins show a wide variety of λmax by changing steric and electrostatic interactions between all-trans retinal chromophores and surrounding amino acid residues. An understanding of the color-tuning rule enables more efficient screening and the design of new red-shifted rhodopsins that have value as optogenetics tools, and our ML-based data-driven approach therefore provides a new basis to identify color-regulating factors without assumptions.

We previously demonstrated that an ML-based model based on ∼800 experimental results could predict the λmax of microbial rhodopsins with an average error of ±7.8 nm. Encouraged by this result, in the present study, we constructed a new ML-based model to compute expected red-shift gains for a wide range of unknown families of microbial rhodopsins. As a result, 32 out of 39 microbial rhodopsins were found to have red-shifted absorption compared with the base wavelengths of each subfamily of microbial rhodopsins (Table 1), suggesting that our data-driven ML approach can screen red-shifted microbial rhodopsin genes more efficiently than random choice (p = 7.025 × 10−5).

By considering the exploration–exploitation trade-off, that is, to consider not only the expected value of the prediction, but also the uncertainty, it was possible to construct a red-shift protein screening process, as shown in Fig. 7. Figure 7a shows the relationships between the prediction uncertainty (as measured by the standard deviation) and the observed red-shift gains. It can be seen that rhodopsins with red-shift gain are found in areas of not only low (small standard deviation), but also high prediction uncertainty (large standard deviation). Figure 7b shows the two-dimensional projection of the d = 432 dimensional feature space by principal component analysis. It can be seen that red-shift gains (red) are found for target proteins not only close to training proteins (green), but also far from training proteins. Figure 8 shows that the observed wavelengths and red-shift gains tend to be smaller than the predicted ones. We conjecture that these differences between the observed and predicted wavelengths and red-shift gains are due to modeling errors, possibly caused by a lack of sufficient information (e.g., three-dimensional structures) and modeling flexibility (e.g., nonlinear effects); in other words, rhodopsins having high prediction values partly by modeling errors have a high chance of being selected. Therefore, it would be valuable to develop a statistical methodology to eliminate selection bias due to modeling errors.

Fig. 7. Diversity of the selected proteins.

Fig. 7

a Predicted standard deviation (horizontal axis) vs. observed gain (vertical axis). The marker shape represents the subfamily of each protein. b Two-dimensional projection created by principal component analysis. The original d = 432 dimensional feature space is projected onto the first two principal component directions. The first component (horizontal axis) explains 33% of the total variance of the original space, and the second (vertical axis) explains 17%. The green markers are the training data, and the black markers are the target data. For the synthesized proteins, differences in the observed and base wavelengths are shown by the color map. The results indicate that, by considering the exploration–exploitation trade-off, it was possible to make a red-shift protein screening process that considered not only the expected value of the prediction, but also the uncertainty.

Fig. 8. Comparisons of experimental observations and ML predictions.

Fig. 8

In these two plots, the red points have longer observed wavelengths than the base wavelength λbase, while the blue points have shorter observed wavelengths than λbase. a ML-based prediction of λmax (horizontal axis) vs. experimentally observed λmax (vertical axis). b Expected red-shift gain (horizontal axis) vs. observed gain (vertical axis). Since we selected rhodopsins having expected red-shift gains of ≥10 nm, all the points on the horizontal axis are ≥10 nm. The observed gain, defined by max (λmaxλbase,0), is nonnegative by definition. For blue points whose observed gain is equal to 0, the value of λmaxλbase is also shown as blue outlined circles. The green and orange dashed lines are the averages of the horizontal and vertical axes (19.2 nm and 9.5 nm), respectively. The results indicate that the observed wavelengths and red-shift gains tended to be smaller than the predicted ones. We conjecture that these differences between the observed and predicted wavelengths are due to modeling errors (see the Discussion for details).

Four rhodopsins showed red-shifted absorption ≥20 nm than the base wavelength, three of which showed light-driven ion-transport function. Interestingly, while one BacHR from Rubrivirga marina (accession No.: WP 095512583.1) showed a 40-nm longer λmax (577 nm) than the base wavelength, another 11-nm red-shifted BacHR (WP 095509924.1) was also identified from the same bacteria (Table 1). These BacHRs are highly similar to each other (55.2% identity and 70.6% similarity), and only four of 24 amino acid residues around the retinal chromophore differ. Hence, R. marina evolved two BacHRs with 29-nm different λmax by a small number of amino acid replacements; the amino acid residue(s) responsible for this color-tuning should be investigated in the future.

The differences in amino acids in three of 24 retinal-surrounding residues are known to play a color-tuning role in natural rhodopsins without affecting their biological function. These correspond to positions 93, 186, and 215 in BR (BR Leu93, Pro186, and Ala215, respectively)17. Position 93 is known to be diversified in the PR family (the well-known position 105 in PRs). Green-light-absorbing PRs (GPRs) have leucine as a BR, whereas glutamine is conserved in blue-light-absorbing PRs5,26. This color-tuning effect by the difference between leucine and glutamine is known as the “L/Q-switch”42. Interestingly, while 29.8% of 3022 candidate genes have glutamine at this position, all 39 genes whose large red-shift gains were suggested by our ML-based model have amino acids other than glutamine, which suggests that our ML-based model avoided the genes having glutamine at position 93. Especially, 12 (37.5%) of 32 genes that actually showed red-shifted absorption compared with the base wavelengths had methionine at this position (Supplementary Data 6), which is substantially higher than the proportion of methionine-conserving genes in the 3022 candidates (16.1%). The red-shifting effect of the L-to-M mutation of this residue in GPRs previously reported42 and the current result imply that many rhodopsins have evolved methionine to absorb light with longer wavelengths. Position 215 in BR is also known to have a color-tuning role. The mutation from alanine to threonine or serine (A/TS switch) has a blue-shifting effect of 9–20 nm17,4345. Five of six genes that showed blue-shifted λmax compared with the base wavelengths have threonine or serine at this position, suggesting that these types of genes should be avoided to explore red-shifted rhodopsins. By contrast, asparagine was conserved in more than half (58.4%) of the 3022 candidate genes, especially in those belonging to the PR subfamily. A substantial portion (37.5%) of the genes with red-shifted absorption compared with the base wavelengths also had asparagine at this position (Supplementary Data 6). The A-to-N mutation at this position had a smaller effect (4–7 nm)30,44 than that of the A-to-S/T mutation; thus, the difference between alanine and asparagine is not so critical to explore red-shifted rhodopsins. Position 186 in BR is proline in most microbial rhodopsins (in 98.7% of the 3022 candidate genes), and the mutation to non-proline amino acids induces red-shift of absorption17. We identified sodium pump rhodopsin (NaR) from Parvularcula oceani, which also has a threonine at this position, and showed 10-nm longer absorption than the base wavelength. Although genes having non-proline amino acids are rare in nature, it would be beneficial to identify new red-shifted rhodopsins. These results indicate that ML-based modeling can provide insights for identifying new functional tuning rules for proteins based on specific amino acid residues.

The number of reported microbial rhodopsin genes is rapidly increasing due to the development of next-generation sequencing techniques and microbe culturing methods. New microbial rhodopsins with molecular characteristics suitable for optogenetics applications are expected to be included in upcoming genomic data. Data-driven approaches would be able to efficiently suggest promising rhodopsins which should be investigated preferentially. Although the absorption of the most red-shifted rhodopsin found in this study (BacHR from Rubrivirga marina, λmax = 577 nm) is shorter than the peak activation wavelength of eNpHR3.0 (590 nm) which is extensively used in optogenetic studies46, our ML-based model could be expected to reduce the costs associated with identifying red-shifted rhodopsins from upcoming genomic data. Especially, we expect that our ML-based model could be applied to ion channel and enzymatic rhodopsins, which were not a focus of this study because of their eukaryotic origins; however, their use in optogenetics research could help identify more useful optogenetics tools with red-shifted absorption in the future.

Methods

Experimental design

The objective of this study was to introduce and demonstrate the effectiveness of a data-driven experimental design method to screen candidates for rhodopsin proteins with desired properties from more than several thousand candidates identified in various microbial species. To this end, we constructed a training dataset for developing a ML model and a target dataset for screening targets (Construction of training and target data sets). A machine learning model was constructed using the training dataset (ML modeling), which was used to select the 65 candidates from 3022 in the target dataset. The protein expressions of selected candidates were induced (Protein expression), and the absorption spectra and λmax of the selected rhodopsins were measured (Measurement of the absorption spectra and λmax of rhodopsins by bleaching with hydroxylamine). Furthermore, we investigated the ion-transportation properties of the rhodopsins that showed large red-shift gains (Ion-transport assay of rhodopsins in E. coli cells). Statistical significance of the effectiveness of the data-driven experimental design method was assessed by a binomial test.

Construction of training and target data sets

In this study, we constructed a new training data set (Supplementary Data 1) by adding 88 genes for which the λmax had recently been reported in the literature or determined by our experiments, to a previously reported data set30. The sequences were aligned using ClustalW47 and the results were manually checked to avoid improper gaps and/or shifts in the TM parts. The aligned sequences were then used for ML-based modeling.

To collect microbial rhodopsin genes for the training data set, BR48 and heliorhodopsin 48C1249 sequences were used as queries for searching homologous amino acid sequences in NCBI non-redundant protein sequences and metagenomic proteins31 and the Tara Oceans microbiome and virome database32. Protein BLAST (blastp)37 was used for the homology search, with the threshold E-value set at <10 by default, and sequences with >180 amino acid residues were collected. All sequences were aligned using ClustalW47. The highly diversified C-terminal 15-residue region behind the retinal binding Lys (BR Lys216) and long loop of HeR between helices A and B were removed from the sequences to avoid unnecessary gaps in the alignment. The successful alignment of the TM helical regions, especially the 3rd and 7th helices, was checked manually. The phylogenic tree was drawn using the neighbor-joining method50, and the microbial rhodopsin subfamilies were categorized based on the phylogenetic distances, as reported previously38. Based on the phylogenetic tree, 3022 putative ion-pumping rhodopsin genes from bacterial and archaeal origins were extracted, and their aligned sequences were used as the training data set for the prediction of λmax. The original training and test sets are provided in Supplementary Data 1 and Table 1, respectively, and the entire transformed datasets with physicochemical features (see Supplementary Data 2) are provided in Supplementary Data 7.

ML modeling

Suppose that we have K pairs of an amino acid sequence and an absorption wavelength xk,λmax(k)k=1K, where x(k)R MN is the feature vector of the k-th amino acid sequence and λmax(k)R is the absorption wavelength of the k-th rhodopsin protein. The least-absolute shrinkage selection operator (LASSO) is a standard regression model in which important regression coefficients can be automatically selected by the penalty on the absolute value of the coefficient, as follows:

minμ,βk=1Kλmaxkμi=1Mj=1Nβi,jxi,jk2+γi=1Mj=1Nβi,j,

where βRMN is a vector of βi,j and γ > 0 is the regularization parameter. BLASSO is a Bayesian extension of LASSO for which the model is defined through the following random variables:

λmaxk~Nμ+βxk,σ2,β~πβσ2,

where N(μ,s2) is a Gaussian distribution with mean μ and variance s2, and πβσ2=Πi=1MΠj=1Nγ2σ2eγβi,j/σ2 is the conditional Laplace prior. In this model, the maximum of the conditional distribution of the parameter βxk,λmaxkk=1K,λ,σ is equivalent to the LASSO51 estimator. For γ, a hyper-prior is set through the gamma distribution prior on γ2, and the inverse gamma prior is assumed for σ2. For the computational details, see the original paper36. We used the “monomvn” package of R in our implementation. The prediction f (x) was sampled through the Gibbs sampler of β and μ. The number of samplings was set as T = 10,000 times. For each candidate x, we approximately obtain E[gain] by

Egain1Tt=1Tmaxμt+βtxλbase,0,

where μ(t) and β(t) are the t-th sampled parameters. The parameters of the trained model is provided in Supplementary Data 8.

Protein expression

The synthesized genes of microbial rhodopsins codon-optimized for E. coli (Genscript, NJ) were incorporated into the multi-cloning site in the pET21a(+) vector (Novagen, Merck KGaA, Germany). The plasmids carrying the microbial rhodopsin genes were transformed into the E. coli C43(DE3) strain (Lucigen, WI). Protein expression was induced by 1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) in the presence of 10 μM all-trans retinal for 4 h.

Measurement of the absorption spectra and λmax of rhodopsins by bleaching with hydroxylamine

E. coli cells expressing rhodopsins were washed three times with a solution containing 100 mM NaCl and 50 mM Na2HPO4 (pH 7). The washed cells were treated with 1 mM lysozyme for 1 h and then disrupted by sonication for 5 min (VP-300N; TAITEC, Japan). To solubilize the rhodopsins, 3% n-dodecyl-d-maltoside (DDM, Anatrace, OH) was added, and the samples were stirred for overnight at 4 °C. The rhodopsins were bleached with 500 mM hydroxylamine and subjected to yellow light illumination (λ > 500 nm) from the output of a 1-kW tungsten−halogen projector lamp (Master HILUX-HR; Rikagaku) through colored glass (Y-52; AGC Techno Glass, Japan) and heat-absorbing filters (HAF-50S-15H; SIGMA KOKI, Japan). The absorption change upon bleaching was measured by a UV-visible spectrometer (V-730; JASCO, Japan).

Ion-transport assay of rhodopsins in E. coli cells

To assay the ion-transport activity in E. coli cells, the cells carrying expressed rhodopsin were washed three times and resuspended in unbuffered 100 mM NaCl. A cell suspension of 7.5 mL at OD660 = 2 was placed in the dark in a glass cell at 20 °C and illuminated at λ > 500 nm from the output of a 1-kW tungsten–halogen projector lamp (Rikagaku, Japan) through a long-pass filter (Y-52; AGC Techno Glass, Japan) and a heat-absorbing filter (HAF-50S-50H; SIGMA KOKI, Japan). The light-induced pH changes were measured using a pH electrode (9618S-10D; HORIBA, Japan). All measurements were repeated under the same conditions after the addition of 10 μM CCCP.

Imaging and electrophysiological assays

For heterologous expression in mammalian cultured cells, the synthesized rhodopsin genes were inserted into the cloning site between the CMV promoter and eYFP in phKR2-3.0-EYFP52 using EcoRI and BamHI. All experiments were carried out using ND7/23 cells, lined hybrid cells derived from neonatal rat dorsal root ganglion neurons fused with the mouse neuroblastoma, which were transfected with plasmids as previously described53. EYFP fluorescence (543 nm) in the ND7/23 cells expressing the rhodopsins were imaged under a confocal laser scanning microscopy (LSM510, Carl Zeiss, Oberkochen, Germany) at 512 × 512 pixels using a water-immersion objective (×63/0.95, Achroplan, Carl Zeiss) and Ar laser (514 nm). Currents were recorded using an EPC-8 amplifier (HEKA Electronic, Lambrecht, Germany) under a whole-cell patch clamp configuration while a 200 ms pulse illuminations at 549 ± 15 (nm, >90% of the maximum) and 28 mW‧mm−2 was given at 0.1 Hz using a SpectraX light engine (Lumencor Inc., Beaverton, OR). The internal pipette solution contained (in mM) 121.2 KOH, 90.9 glutamate, 5 Na2EGTA, 49.2 HEPES, 2.53 MgCl2, 2.5 MgATP, 0.0025 ATR (pH 7.4 adjusted with HCl). The extracellular Tyrode’s solution contained (in mM): 138 NaCl, 3 KCl, 2.5 CaCl2, 1 MgCl2, 10 HEPES, 4 NaOH, and 11 glucose (pH 7.4 adjusted with HCl).

Statistical analysis

We assessed the effectiveness of the data-driven experimental design method by comparing it with random selection in terms of the proportions of observing red-shift gains in the selected rhodopsins. The statistical significance of the effectiveness was quantified by comparing the red-shift gain proportions 0.82 (=32/39, p = 7.025 × 10−5) with the probability of observing red-shift gains from randomly selected rhodopsins, i.e., 0.50, based on a binomial test. Since we set the base wavelength of each subfamily to the λmax of rhodopsin which was studied in detail in previous work and equal or longer than the empirical median of the λmax in each subfamily (Supplementary Fig. 2), it is reasonable to assume that the probability of observing red-shift gains from randomly selected rhodopsins must be smaller than or equal to 0.50. For statistical analysis of the ML model building and the evaluation of its performance, see the ML modeling section above.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

42003_2021_1878_MOESM2_ESM.pdf (6.5KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (319.8KB, xlsx)
Supplementary Data 2 (14.7KB, xlsx)
Supplementary Data 3 (28.8KB, xlsx)
Supplementary Data 4 (182.4KB, xlsx)
Supplementary Data 5 (15.4KB, xlsx)
Supplementary Data 6 (16KB, xlsx)
Supplementary Data 7 (6.8MB, xlsx)
Supplementary Data 8 (15.7KB, xlsx)
Supplementary Data 9 (2.1MB, xlsx)
Reporting Summary (300.2KB, pdf)

Acknowledgements

This work was supported by Grants-in-Aid from the Japan Society for the Promotion of Science (JSPS) for Scientific Research (KAKENHI grant Nos. 17H03007 to K.I., 17H04694 and 16H06538 to M.Karasuyama, 19H04959 to H.K., and 16H06538, 17H00758, and 20H00601 to I.T.), the Japan Science and Technology Agency (JST), PRESTO, Japan (grant Nos. JPMJPR15P2 to K.I. and JPMJPR15N2 to M.Karasuyama), and CREST, Japan (grant No. JPMJCR1502) to I.T.; K.I., H.K., and I.T. received support from RIKEN AIP; O.B. received support from the Louis and Lyra Richmond Memorial Chair in Life Sciences.

Author contributions

K.I., M.Karasuyama, H.K., and I.T. contributed to the study design. K.I., D.Y., K.Y., and O.B. conducted the phylogenetic analysis of rhodopsins and the construction of training data. M.Karasuyama, Y.I., and I.T. constructed the ML model and calculated E[gain]. K.I., R. N., K.M., and T.N. constructed the DNA plasmids of rhodopsin genes and introduced them into E. coli and mammalian cells. R.N. and K.M. measured λmax of rhodopsins by bleaching proteins with hydroxylamine. M.Konno conducted the pump activity assay of rhodopsins in E. coli cells. H.Y. conducted the electrophysiological measurement of rhodopsins in mammalian cells. K.I., M. Karasuyama, H.K., and I.T. wrote the paper. All authors discussed and commented on the manuscript.

Data availability

All data shown in main figures were deposited in Supplementary Data 9. Data supporting the findings are available from the corresponding authors upon reasonable request.

Code availability

The computational code of this manuscript is available at http://www-als.ics.nitech.ac.jp/~karasuyama/BLASSO-for-Rhodopsins/.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Keiichi Inoue, Masayuki Karasuyama.

Change history

4/30/2021

A Correction to this paper has been published: 10.1038/s42003-021-02090-5

Contributor Information

Keiichi Inoue, Email: inoue@issp.u-tokyo.ac.jp.

Ichiro Takeuchi, Email: takeuchi.ichiro@nitech.ac.jp.

Supplementary information

The online version contains supplementary material available at 10.1038/s42003-021-01878-9.

References

  • 1.Ernst OP, et al. Microbial and animal rhodopsins: Structures, functions, and molecular mechanisms. Chem. Rev. 2014;114:126–163. doi: 10.1021/cr4003769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Govorunova EG, Sineshchekov OA, Li H, Spudich JL. Microbial rhodopsins: diversity, mechanisms, and optogenetic applications. Annu. Rev. Biochem. 2017;86:845–872. doi: 10.1146/annurev-biochem-101910-144233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Oesterhelt D, Stoeckenius W. Rhodopsin-like protein from the purple membrane of Halobacterium halobium. Nat. New Biol. 1971;233:149–152. doi: 10.1038/newbio233149a0. [DOI] [PubMed] [Google Scholar]
  • 4.Oesterhelt D, Stoeckenius W. Functions of a new photoreceptor membrane. Proc. Natl Acad. Sci. USA. 1973;70:2853–2857. doi: 10.1073/pnas.70.10.2853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Man D, et al. Diversification and spectral tuning in marine proteorhodopsins. EMBO J. 2003;22:1725–1731. doi: 10.1093/emboj/cdg183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Venter JC, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. doi: 10.1126/science.1093857. [DOI] [PubMed] [Google Scholar]
  • 7.Inoue K, Kato Y, Kandori H. Light-driven ion-translocating rhodopsins in marine bacteria. Trends Microbiol. 2014;23:91–98. doi: 10.1016/j.tim.2014.10.009. [DOI] [PubMed] [Google Scholar]
  • 8.Inoue K, et al. A light-driven sodium ion pump in marine bacteria. Nat. Commun. 2013;4:1678. doi: 10.1038/ncomms2689. [DOI] [PubMed] [Google Scholar]
  • 9.Nagel G, et al. Channelrhodopsin-1: a light-gated proton channel in green algae. Science. 2002;296:2395–2398. doi: 10.1126/science.1072068. [DOI] [PubMed] [Google Scholar]
  • 10.Niho, A. et al. Demonstration of a light-driven SO42- transporter and its spectroscopic characteristics. J. Am. Chem. Soc. 139, 4376–4389 (2017). [DOI] [PubMed]
  • 11.Deisseroth KOptogenetics. 10 years of microbial opsins in neuroscience. Nat. Neurosci. 2015;18:1213–1225. doi: 10.1038/nn.4091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Liu X, et al. Optogenetic stimulation of a hippocampal engram activates fear memory recall. Nature. 2012;484:381–385. doi: 10.1038/nature11028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ramirez S, et al. Creating a false memory in the hippocampus. Science. 2013;341:387–391. doi: 10.1126/science.1239073. [DOI] [PubMed] [Google Scholar]
  • 14.Yizhar O, et al. Neocortical excitation/inhibition balance in information processing and social dysfunction. Nature. 2011;477:171–178. doi: 10.1038/nature10360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Marshel JH, et al. Cortical layer-specific critical dynamics triggering perception. Science. 2019;365:eaaw5202. doi: 10.1126/science.aaw5202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Schneider F, Grimm C, Hegemann P. Biophysics of channelrhodopsin. Annu. Rev. Biophys. 2015;44:167–186. doi: 10.1146/annurev-biophys-060414-034014. [DOI] [PubMed] [Google Scholar]
  • 17.Inoue K, et al. Red-shifting mutation of light-driven sodium-pump rhodopsin. Nat. Commun. 2019;10:1993. doi: 10.1038/s41467-019-10000-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ganapathy S, et al. Retinal-based proton pumping in the near infrared. J. Am. Chem. Soc. 2017;139:2338–2344. doi: 10.1021/jacs.6b11366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hayashi S, et al. Structural determinants of spectral tuning in retinal proteins-bacteriorhodopsin vs sensory rhodopsin II. J. Phys. Chem. B. 2001;105:10124–10131. [Google Scholar]
  • 20.Fujimoto K, Hayashi S, Hasegawa JY, Nakatsuji H. Theoretical studies on the color-tuning mechanism in retinal proteins. J. Chem. Theory Comput. 2007;3:605–618. doi: 10.1021/ct6002687. [DOI] [PubMed] [Google Scholar]
  • 21.Pedraza-González L, De Vico L, Marı NM, Fanelli F, Olivucci M. a-ARM: automatic rhodopsin modeling with chromophore cavity generation, ionization state selection, and external counterion placement. J. Chem. Theory Comput. 2019;15:3134–3152. doi: 10.1021/acs.jctc.9b00061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Tsujimura M, et al. Mechanism of absorption wavelength shifts in anion channelrhodopsin-1 mutants. Biochim. Biophys. Acta Bioenerg. 2021;1862:148349. doi: 10.1016/j.bbabio.2020.148349. [DOI] [PubMed] [Google Scholar]
  • 23.Katayama, K. & Sekharan, S. S. Y. Optogenetics (eds Yawo, H., Kandori, H. & Koizumi, A.) Ch. 7, 89–107 (Springer, 2015).
  • 24.Engqvist MK, et al. Directed evolution of Gloeobacter violaceus rhodopsin spectral properties. J. Mol. Biol. 2015;427:205–220. doi: 10.1016/j.jmb.2014.06.015. [DOI] [PubMed] [Google Scholar]
  • 25.Kojima K, et al. Green-sensitive, long-lived, step-functional anion channelrhodopsin-2 variant as a high-potential neural silencing tool. J. Phys. Chem. Lett. 2020;11:6214–6218. doi: 10.1021/acs.jpclett.0c01406. [DOI] [PubMed] [Google Scholar]
  • 26.Pushkarev A, et al. The use of a chimeric rhodopsin vector for the detection of new proteorhodopsins based on color. Front. Microbiol. 2018;9:439. doi: 10.3389/fmicb.2018.00439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Oda K, et al. Crystal structure of the red light-activated channelrhodopsin Chrimson. Nat. Commun. 2018;9:3949. doi: 10.1038/s41467-018-06421-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Klapoetke NC, et al. Independent optical excitation of distinct neural populations. Nat. Methods. 2014;11:338–346. doi: 10.1038/nmeth.2836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Govorunova EG, et al. RubyACRs, nonalgal anion channelrhodopsins with highly red-shifted absorption. Proc. Natl Acad. Sci. USA. 2020;117:22833–22840. doi: 10.1073/pnas.2005981117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Karasuyama M, Inoue K, Nakamura R, Kandori H, Takeuchi I. Understanding colour tuning rules and predicting absorption wavelengths of microbial rhodopsins by data-driven machine-learning approach. Sci. Rep. 2018;8:15580. doi: 10.1038/s41598-018-33984-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Brown GR, et al. Gene: a gene-centered information resource at NCBI. Nucleic Acids Res. 2015;43:D36–D42. doi: 10.1093/nar/gku1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Sunagawa S, et al. Ocean plankton. Structure and function of the global ocean microbiome. Science. 2015;348:1261359. doi: 10.1126/science.1261359. [DOI] [PubMed] [Google Scholar]
  • 33.Bishop, C. M. Pattern Recognition And Machine Learning (Springer, 2006).
  • 34.Snoek, J., Larochelle, H. & Adams, R. P. Advances in Neural Information Processing Systems 25 (NIPS 2012). (eds. Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.) 2951–2959 (Curran Associates, Inc., 2012).
  • 35.Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & Freitas, N. D. in Proceedings of the IEEE. 148–175 (IEEE, 2016).
  • 36.Park T, Casella G. The Bayesian Lasso. J. Am. Stat. Assoc. 2008;103:681–686. [Google Scholar]
  • 37.Johnson M, et al. Ncbi blast: a better web interface. Nucleic Acids Res. 2008;36:W5–W9. doi: 10.1093/nar/gkn201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yamauchi Y, et al. Engineered functional recovery of microbial rhodopsin without retinal-binding lysine. Photochem. Photobiol. 2019;95:1116–1121. doi: 10.1111/php.13114. [DOI] [PubMed] [Google Scholar]
  • 39.Hasemi T, Kikukawa T, Kamo N, Demura M. Characterization of a cyanobacterial chloride-pumping rhodopsin and its conversion into a proton pump. J. Biol. Chem. 2016;291:355–362. doi: 10.1074/jbc.M115.688614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Harris A, et al. Molecular details of the unique mechanism of chloride transport by a cyanobacterial rhodopsin. Phys. Chem. Chem. Phys. 2018;20:3184–3199. doi: 10.1039/c7cp06068h. [DOI] [PubMed] [Google Scholar]
  • 41.Béjà O, et al. Bacterial rhodopsin: Evidence for a new type of phototrophy in the sea. Science. 2000;289:1902–1906. doi: 10.1126/science.289.5486.1902. [DOI] [PubMed] [Google Scholar]
  • 42.Ozaki Y, Kawashima T, Abe-Yoshizumi R, Kandori H. A color-determining amino acid residue of proteorhodopsin. Biochemistry. 2014;53:6032–6040. doi: 10.1021/bi500842w. [DOI] [PubMed] [Google Scholar]
  • 43.Shimono K, Ikeura Y, Sudo Y, Iwamoto M, Kamo N. Environment around the chromophore in pharaonis phoborhodopsin: Mutation analysis of the retinal binding site. Biochim. Biophys. Acta. 2001;1515:92–100. doi: 10.1016/s0005-2736(01)00394-7. [DOI] [PubMed] [Google Scholar]
  • 44.Sudo Y, et al. A blue-shifted light-driven proton pump for neural silencing. J. Biol. Chem. 2013;288:20624–20632. doi: 10.1074/jbc.M113.475533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Inoue K, et al. Converting a light-driven proton pump into a light-gated proton channel. J. Am. Chem. Soc. 2015;137:3291–3299. doi: 10.1021/ja511788f. [DOI] [PubMed] [Google Scholar]
  • 46.Fenno L, Yizhar O, Deisseroth K. The development and application of optogenetics. Annu. Rev. Neurosci. 2011;34:389–412. doi: 10.1146/annurev-neuro-061010-113817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Thompson JD, Higgins DG, Gibson TJ. Clustal-W - improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Khorana HG, et al. Amino acid sequence of bacteriorhodopsin. Proc. Natl Acad. Sci. USA. 1979;76:5046–5050. doi: 10.1073/pnas.76.10.5046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Pushkarev A, et al. A distinct abundant group of microbial rhodopsins discovered using functional metagenomics. Nature. 2018;558:595–599. doi: 10.1038/s41586-018-0225-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Saitou N, Nei M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
  • 51.Tibshirani R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. 1996;58:267–288. [Google Scholar]
  • 52.Kato HE, et al. Structural basis for Na+ transport mechanism by a light-driven Na+ pump. Nature. 2015;521:48–53. doi: 10.1038/nature14322. [DOI] [PubMed] [Google Scholar]
  • 53.Nagasaka Y, et al. Gate-keeper of ion transport-a highly conserved helix-3 tryptophan in a channelrhodopsin chimera, C1C2/ChRWR. Biophys. Physicobiol. 2020;17:59–70. doi: 10.2142/biophysico.BSJ-2020007. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

42003_2021_1878_MOESM2_ESM.pdf (6.5KB, pdf)

Description of Additional Supplementary Files

Supplementary Data 1 (319.8KB, xlsx)
Supplementary Data 2 (14.7KB, xlsx)
Supplementary Data 3 (28.8KB, xlsx)
Supplementary Data 4 (182.4KB, xlsx)
Supplementary Data 5 (15.4KB, xlsx)
Supplementary Data 6 (16KB, xlsx)
Supplementary Data 7 (6.8MB, xlsx)
Supplementary Data 8 (15.7KB, xlsx)
Supplementary Data 9 (2.1MB, xlsx)
Reporting Summary (300.2KB, pdf)

Data Availability Statement

All data shown in main figures were deposited in Supplementary Data 9. Data supporting the findings are available from the corresponding authors upon reasonable request.

The computational code of this manuscript is available at http://www-als.ics.nitech.ac.jp/~karasuyama/BLASSO-for-Rhodopsins/.


Articles from Communications Biology are provided here courtesy of Nature Publishing Group

RESOURCES