Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2023 Apr 12;19(4):e1011033. doi: 10.1371/journal.pcbi.1011033

Gate-based quantum computing for protein design

Mohammad Hassan Khatami 1, Udson C Mendes 2, Nathan Wiebe 3,4,5, Philip M Kim 1,3,6,*
Editor: Nir Ben-Tal7
PMCID: PMC10124842  PMID: 37043517

Abstract

Protein design is a technique to engineer proteins by permuting amino acids in the sequence to obtain novel functionalities. However, exploring all possible combinations of amino acids is generally impossible due to the exponential growth of possibilities with the number of designable sites. The present work introduces circuits implementing a pure quantum approach, Grover’s algorithm, to solve protein design problems. Our algorithms can adjust to implement any custom pair-wise energy tables and protein structure models. Moreover, the algorithm’s oracle is designed to consist of only adder functions. Quantum computer simulators validate the practicality of our circuits, containing up to 234 qubits. However, a smaller circuit is implemented on real quantum devices. Our results show that using O(N) iterations, the circuits find the correct results among all N possibilities, providing the expected quadratic speed up of Grover’s algorithm over classical methods (i.e., O(N)).

Author summary

Protein design aims to create novel proteins or enhance the functionality of existing proteins by tweaking their sequences through permuting amino acids. The number of possible configurations, N, grows exponentially as a function of the number of designable sites (s), i.e., N = As, where A is the number of different amino acids (A = 20 for canonical amino acids). The classical computation methods require O(N)) queries to search and find the low-energy configurations among N possible sequences. Searching among these possibilities becomes unattainable for large proteins, forcing the classical approaches to use sampling methods. Alternatively, quantum computing can promise quadratic speed-up in searching for answers in an unorganized list by employing Grover’s algorithm. Our work shows the implementation of this algorithm at the circuit level to solve protein design problems. We first focus on lattice model-like systems and then improve them to more realistic models (change in the energy as a function of distances). Our algorithms can implement various custom pair-wise energy tables and any protein structure models. We have used quantum computer simulators to validate the practicality of our circuits which require up to 234 qubits. We have also implemented a simple version of our circuits on real quantum devices. Our results show that our circuits provide the expected quadratic speed-up of Grover’s algorithm.


This is a PLOS Computational Biology Methods paper.

1. Introduction

Protein design is a procedure to construct proteins with certain configurations to achieve novel functionality. In this regard, amino acids are mutated in the protein’s sequence to find sets of residues that provide the lowest energy of the protein in the expected configuration. Using computational approaches, one could consider having “s” designable sites in the sequence and “A” different amino acids that could fill these sites, where A = 20 for the canonical amino acids. This will provide As possible sets of amino acids to find the answer. Thus, the number of possible sets of amino acid sequences grows exponentially by increasing the designable sites.

In computer science terminology, the protein design is categorized under non-deterministic polynomial-time (NP)-hard problems [1,2]. The main characteristics of these problems are that the computational time and resources needed to find their solutions scale poorly. Thus, finding their answer using conventional computers could become either impossible or requires a great deal of computational time. However, the validity of a proposed answer could be evaluated in polynomial time by conventional computers [3]. Even for the simple hydrophobic-polar (HP) [4] protein lattice models containing only two types of residues (hydrophobic and polar), the protein design is shown to be in the class of NP-hard problems [5].

Statistical methods such as Markov chain Monte Carlo (MCMC) [6,7] are currently being used to solve NP-hard problems, including protein design problems, on conventional computers [810]. In these methods, the algorithm uses sampling techniques and probability distributions to find the answers among all possible sets of amino acids. However, since the probabilistic methods do not explore all sets, it is possible to miss some of the answer states.

Unlike conventional approaches, quantum computation techniques are expected to enhance solving the NP class of problems in their exact forms [3]. In recent years, there have been attempts to use quantum computers to solve NP-hard problems in protein studies, mainly focused on protein folding [1114]. In these studies, hybrid quantum-classical algorithms employing gate-based quantum devices, such as the Quantum Approximate Optimization Algorithm (QAOA) [15] and the Variational Quantum Eigensolver (VQE) [16], as well as quantum annealing approaches [17] are implemented.

As an example of gate-based approaches, Fingerhuth et al. [14] used the QAOA method to study the protein folding on a 9-residue protein using the HP energy model and a 3D lattice structure. Similarly, a version of the VQE approach is employed by Robert et al. [13] to study the protein folding of a 10-residue protein, Angiotensin, and a 7-residue neuropeptide. In the case of implementing quantum annealers for protein folding, Perdomo-Ortiz et al. [11] studied the folding of a 6-residue peptide using a fixed energy table in a 2D lattice model, employing 81 D-Wave’s “superconducting quantum bits”. Similarly, Babej et al. [12] have worked on folding a ten residue Chignolin protein and an eight residue Trp-Cage peptide in 2D and 3D lattice models, respectively. This study used 2048 superconducting quantum bits of D-Wave’s 2000Q quantum annealer device. Regardless of the approach, the studies employing quantum computation for protein folding are mainly limited to peptides with only a handful of residues and a very simplified or limited number of amino acid types, e.g., the HP model. In the case of protein design, Mulligan et al. [18] have prepared a hybrid quantum-classical solver in Rosetta software [19], called QPacker, to address the protein design problem on D-Wave’s 2000Q quantum annealer device. Despite the attempts to use quantum computation, to the best of our knowledge, there are no records of studies in which a pure quantum computational method based on a pure gate-based approach is used to investigate protein design problems.

This work introduces a procedure to build gate-based quantum circuits, employing Grover’s algorithm [20], to solve protein design problems. Grover’s algorithm is a fundamental and famous quantum computation algorithm that offers a quadratic speedup in finding answers in an un-sorted list over classical methods [3,2022]. In general, Grover’s algorithm is composed of four main parts: initialization, Grover’s oracle, Grover’s diffuser and measurement (Fig 1A). The initialization, the diffuser and the measurement steps are almost similar for all systems in this study. However, the oracle step varies depending on the complexity of the system and is the only step that requires auxiliary work qubits, in addition to the n qubits (Fig 1B).

Fig 1. Schematic representation of our circuits.

Fig 1

A) Different steps of Grover’s algorithm; B) Different sub-steps implemented in our Oracle. The parallel lines represent qubits. The blue lines show the n qubits and the red lines show the work qubits. In the oracle, if |k〉 is an answer state, f(k) = 1, otherwise it is 0.

In the initialization step, n qubits are allocated to create a superposition of N = 2n quantum states by applying Hadamard (H) gates i.e., Hn, representing all possible answer states in the system. This step is similar to other quantum algorithms such as Shor’s algorithm [23], and the Deutsch-Jozsa algorithm [24].

In the oracle, the circuit is programmed to implement energies and do all necessary calculations to find the answer states (Fig 1B). This part of the algorithm inputs the general features of the structure (Fig 2), pre-computed pair-wise interaction energy tables (Fig 3), and a threshold energy (Eth) value (all discussed in detail later in the paper). In addition, pre-computed distance features are provided as input to the oracle depending on the system’s complexity. The oracle is designed to use only the summation operation (and multiplication, technically a form of summation). First, the energy values are summed for each pair of interacting designable sites to find the total energy of the sequence. Then, it is subtracted from the Eth. If the result is negative, the oracle marks that sequence as an answer state by negating its amplitude (Fig 1B). Note that since the probability of each state is the only quantity of a quantum state that could be measured in the classic realm, the negative amplitude of the answer states in the oracle step does not change the relative probabilities of states (probability = (amplitude). (amplitude)*).

Fig 2. Schematic representations of protein models.

Fig 2

SP model with: a) Two designable sites; b) Three designable sites; c) Six designable sites with I) three, II) four, and III) five pair-wise interactions. MR model with: d) Two designable sites; e) Three designable sites. The designable sites are shown as circles with numbers. The red dashed lines represent the interactions among the designable sites. In a–c) the pattern of interactions between the sites are provided for each structure. The checkered circles are non-interacting residues. In this study, there are no geometrical differences between residues, and all are being treated as identical beads represented with circles. In d) and e) the dij1 is the corresponding distance reciprocal between designable sites i and j.

Fig 3. Energy tables to represent the pair-wise interactions in our systems and the binary representation of residues.

Fig 3

Energy tables for: a) The SP model; b) The MR model. c) Binary representation of residues in the energy table. d) Energy table and the residue representations for the HP protein model. In a–c) H1 and H2 represent two types of hydrophobic residues, Pol1 and Pol2 represent two types of polar residues, Pos represents a positive residue, Neg represents a negative residue, and two types of “other” residues that do not fit in any of the previous categories are represented by X1 and X2. Note that all energies in our tables have qualitative values. In d) H represents hydrophobic and P represents polar residues.

In the next step of the algorithm, the diffuser increases the probability of answer states, marked by the oracle, in the circuit (Fig 1A). In the final step, the results of the circuit are measured. To increase the probability of finding the answer states, the oracle and the diffuser steps should be repeated for the O(N/M) number of iterations (R), where M is the number of answer states in the circuit [3].

Three main oracle models are developed in this study, each with a distinct protein structure representation, distance dependencies, and pair-wise interaction energy tables. In the first model, which is addressed as the “SP” (simplified) model, there are no distance dependencies between the designable sites, and the 2D lattice model-like structures represent the protein (Fig 2A, 2B and 2C). Moreover, only integer numbers are used for energies (Fig 3A), and the oracle only uses the summation function. In the second model, i.e., the “MR” (more realistic) model, the pair-wise energy table (Fig 3B) and pre-computed reciprocals of the distances (d-1) between the designable sites are introduced in the circuit (Fig 2D and 2E). First, the distance reciprocals are multiplied by the pair-wise energy of residues filling each designable site. Then, the values are summed to find the system’s total energy. In this model, fixed-point decimal numbers are used to calculate the results (Fig 3B). For the SP and MR models, since we are in the noisy intermediate-scale quantum (NISQ) devices era [25],we use quantum computer simulators to study the circuits’ validity and results. However, to test the practicality of our algorithms on real IBM quantum devices, a third model is developed, which is a simplified version of the SP model, addressed as “IBM-SP”. This model uses the hydrophobic-polar energy table in Fig 3D and the protein structure in Fig 2A.

Note that in the MR model, the d-1 mainly acts as weights of interactions, providing a weighted impact of each pair-wise interaction in the system. However, considering Ecoulomb=kqiqjd, where k is the Coulomb constant, q is the electric charges, and i and j are the two interacting particles, one can treat the values in Fig 3B as constants representing kqiqj between residues i and j. Thus, the MR model is directly implementing the Coulomb potential for protein design in its current form. Moreover, since we feed the pre-computed values/tables to our algorithms, the MR model can implement all potential energies for protein design in conventional packages, such as Rosetta biomolecular modelling suite [26]. For example, one can input the cosine value of distances or angles for the bonded or angular interactions.

Our results show that by using the quantum simulators, our circuits offer the expected O(N/M) queries to find the answers states, which confirm the utilization of Grover’s algorithm’s properties in them. However, the results of real quantum computers indicate the need for devices with much lower noise to implement our circuits.

2. Results

2.1. Number of qubits in the circuit

The number of qubits required to represent a residue in a unique binary format is given by g = [log2(A)]. Thus, g = 3 for the eight amino acids used in both SP and MR models (Fig 3A, 3B and 3C). Even though our energy tables with eight types of residues are simpler than the canonical model with A = 20 (g = 5), they are still more complicated than the widely used HP table in the IBM-SP model (g = 1).

Table 1 shows the number of qubits required by different circuits in our study. The total number of n qubits in the circuits is given by

n=g×s (1)

where g = log2(number of residues). For the SP and MR models, n = 3×s, while this number for the IBM-SP model is n = 1×s.

Table 1. A brief description of each circuit in this study.

Circuit (model) n N m Oracle’s work qubits Total qubits (q)
s = 2 (SP) 6 64 4 9 15
s = 3 (SP) 9 512 5 11 20
s = 6,i = 3 (SP) 18 262,144 6 13 31
s = 6,i = 4 (SP) 18 262,144 6 13 31
s = 6,i = 5 (SP) 18 262,144 7 15 33
s = 2 (MR) 6 64 9 145 151
s = 2 (MR-MP) 6 64 14 225 231
s = 3 (MR) 9 512 9 145 154
s = 3 (MR-MP) 9 512 14 225 234
s = 2 (IBM-SP) 2 4 2 5 7

In addition, we require m number of qubits in the circuit set to represent each numeric value, e.g., the energy of each pair-wise interaction, as a part of work qubits (Fig 1). The required number of qubits to represent a numeric value can be found as

m=log2((|Emax|+|Emin|)×i×2)+p (2)

where i is the number of interactions in the system, p is the number of qubits allocated to represent the values after the decimal point, and Emin and Emax are the minimum and maximum values in the energy table in Fig 3. In the SP model, p = 0 and i is represented as input to the circuit (Fig 2A). However, in the MR model

i=imax=(s×(s1)2) (3)

Here, since we use fixed-point decimal numbers, setting p = 5 in Eq 2 provides the precision of 0.03125 that is the default for MR model. We also study this model with more precision (MR-MP), where p = 10 and the circuit can represent values smaller than ~0.001. More detail on how to choose the minimum required number of m qubits for each system is provided in S1 Appendix.

As shown in Table 1, the number of work qubits is 2m+1 and 16m+1 for the SP and MR models, respectively. Since we use different calculations in the oracle, the required number of work qubits differs for the two models (discussed in the Methods section).

From Eq 1, Eq 2 and Eq 3, we have that the maximum of total number of qubits (q) grows as ~(3×s+c1log2(s)+c2), where c1 and c2 are constant values. Trying to calculate the distance reciprocals on the same quantum circuit, as discussed by Bhaskar et al. [27], would add O(s2c3×s) qubits to the circuit, where c3 is a positive constant value.

Moreover, since the number of qubits required for the SP and MR model is large, we use the matrix product state (MPS) simulator [28], the only simulator that currently can simulate circuits with this many qubits. More detail is provided in the Methods section.

2.2. Finding the answer states

Fig 4 shows the probability of finding every individual state for four different systems containing two designable sites in the SP and MR models. Here, the answer states are clearly distinguished with their higher probabilities over the other states in the system. The answer states for all systems in this study are provided in detail in S2 Appendix.

Fig 4. Histogram representations of the probability of finding each state (64 in total) in circuits with two designable sites.

Fig 4

Results for circuits in the SP model with: a) Eth = -3 and R = 1; b) Eth = -3 and R = Rmax = 4; c) Eth = -2 and R = 1; d) Eth = -2 and R = Rmax = 2. Results for circuits in the MR model with: e) Eth = 95%Emin and R = 1; f) Eth = 95%Emin and R = Rmax = 4; g) Eth = 85%Emin and R = 1; h) Eth = 85%Emin and R = Rmax = 3.

In the SP model with two designable sites, by setting the Eth to –3 (using Eq 6 in the Methods section), the algorithm finds the lowest number of answer states. Here, the algorithm finds two distinct results: Pos-Neg (i.e., residue 1 is Pos and residue 2 is Neg) and Neg-Pos, as shown in Fig 4A and 4B. These two results are expected as they are the lowest values in the pair-wise energy table with E = –4 in Fig 3A. Increasing Eth to –2, the circuit finds five answer states, adding three new states to the previous two answer states from the Eth = -3 (Fig 4C and 4D).

Unlike the SP model, in the MR model, since we use decimal numbers, we can choose the Eth value more precisely (Eq 7 in the Methods section). To choose the answer states within the 5% and 15% range of the minimum energy of the system, the Eth is set to 95%Emin and 85%Emin, respectively. Here, the Eth = 95%Emin leads to two answer states, while choosing the Eth = 85%Emin provides three answer states (Fig 4E–4H).

The results in Fig 4 show that even with a single iteration (R = 1), finding each answer state is more probable than finding each non-answer state for s = 2 systems. However, the total probability of finding the answer states is less than the total probability of finding the non-answer states at R = 1, which is generally correct unless for systems with the maximum number of iterations (Rmax) value of 1 or 2. In other words, to have a higher probability of picking the answer states (no matter which one) among all N possible states, the total probability of finding the answer states should be higher than 50%. This probability increases to ~100% by increasing the number of iterations to R = Rmax (Eq 8 in the Methods section). For example, the total probability of answer states in the s = 2,Eth = –3 system in the SP model is only ~26% (~12.9% each state), while the probability of finding non-answer states is ~74% (Fig 4A). Nevertheless, using R = Rmax = 4, the probability of finding the answer states increases to ~100%, and the probability of all other states becomes almost zero (Fig 4B).

It should be noted that the results for the s = 2,Eth = –3 system in the SP model (Fig 4A and 4B) are almost identical to the results of the s = 2,Eth = 95%Emin system in the MR model (Fig 4E and 4F). Regardless of the oracle complexity in each model, since these two systems have the same number of answer states (M = 2) out of the same number of total states (N = 64), the probability of finding each state is the same for both systems. Moreover, since the M and the N are the same for these systems, they have the same behaviour with changing the number of iterations, which will be discussed in more detail later in this section.

In the case of systems with s = 2, the only recognizable difference between the SP and the MR models is the higher accuracy in choosing the Eth. Here, the role of the d-1 in the MR model is suppressed due to the spatial symmetry in the two designable site systems. However, the s = 3 system with different distances between each site (Fig 2E) breaks the symmetry and illustrates the effect of d−1 in the MR model. To show this effect, we compare the results for the structure shown in Fig 2E with a similar system with complete spatial symmetry, i.e., an equilateral triangle, where the dij = 1. The same answer states are provided for the s = 3,Eth = 70%Emin system with the symmetrical and asymmetrical configurations. The same is correct for the s = 3,Eth = 80%Emin system. Nevertheless, by choosing the Eth = 50%Emin, the system with the symmetrical configuration has 10 answer states while the asymmetrical system produces 19 answer states (results provided in S2 Appendix).

2.3. Role of number of iterations

Fig 5 shows the probability of finding the answer states as a function of the normalized number of iterations, R/Rmax. Here, the probability curves represent a universal pattern for different systems in the SP and the MR models, whereby increasing the R, the probability of finding the answer states increases, reaching ∼100% at R = Rmax. Since the computational cost of simulating larger systems is high, only the first few iterations are simulated for s = 6 systems in the SP model (Fig 5A), while running them for R = Rmax would require years of simulation on CPU and terabytes of RAM (discussed in S3 Appendix in detail). Moreover, the probability curves in Fig 5 follow the ~sin2(αR), which is expected behaviour of Grover’s algorithm [3,29], where α is a constant value (more detail in S4 Appendix).

Fig 5. Probability of finding answer states in different systems as a function of normalized number of iterations (R/Rmax).

Fig 5

Results for a) The SP model; b) The MR model; The inset in a), represents data for the first few R for systems with six designable sites.

The Rmax values obtained from simulations of systems in both SP and MR models, i.e., the ones reached to the R/Rmax = 1 (Fig 5), are plotted against N/M in Fig 6. These results show that the circuits follow the RmaxNM behaviour, the quantum advantage that Grover’s algorithm is expected to provide.

Fig 6. The maximum number of iterations (Rmax) values as a function of normalized number of states (N/M).

Fig 6

The circles represent the data for the SP model and the stars are the data for the MR model. The magenta dashed line shows the Rmax threshold for Grover’s algorithm, while the black dashed line is the threshold of the classic realm.

Moreover, the results for the SP,s = 2,Eth = –3 system and the MR,s = 2,Eth = 95%Emin system show that since N/M values are the same, the Rmax values are identical (Fig 6). The same is correct for the SP,s = 2,Eth = –2 and the MR,s = 2,Eth = 70%Emin systems.

2.4. Number of gates used for classic and quantum algorithms

To compare conventional classical methods with our quantum circuits, assume we have a classical algorithm designed to search through all possible states to find the answers. Furthermore, suppose the same input data we use in quantum circuits are provided for the classic models, i.e., protein structures, interaction patterns, pre-computed pair-wise energy tables and distance reciprocals. Also, assume that the same number of bits and qubits are allocated to represent a value in both classic and quantum algorithms, i.e., m is the same. The latter confirms that the number of computations and the accuracy of the calculated values are similar. Despite longer bits/qubits providing higher accuracy, they require more computations (gates) units. Note that even though the total number of qubits in our quantum circuits is limited to q (Table 1), the total number of bits in the classical approach is not limited.

As discussed earlier in this paper, a classical search algorithm requires O(N) iterations to find the answer states. In contrast, our quantum circuits only require O(N) iterations. Nevertheless, this comparison does not consider the number of computations used in the classic and quantum approaches. In our quantum algorithms, the total number of computations is given by

#ofQtot=#ofQinit.+O(N)×(#ofQorcl.+#ofQdiff.) (4)

where #ofQinit., #ofQorcl., and #ofQdiff. are the number of computations conducted in the initialization step, the oracle and the diffuser, respectively (Fig 1A). However, in the classic realm, the total number of computations (#ofCtot) required to go through all possible states to find the answers is

#ofCtot=O(N)×(#ofC) (5)

where #ofC represents all computations required to find the energy of a single state.

For simplicity, we compare the SP model with a similarly complex classic model, referred to as “SP-classic”. In the SP-classic model, similar to the SP model, the pair-wise energies of designable sites are added together for each N combination of amino acids (sequence) and subtracted from the Eth to find the answer states. In the classic computation, once the addition between two numbers occurs in the SP-classic, the bits get restored and ready for the next number. However, for the SP model, due to the quantum nature of the circuit, the qubits that will be re-used should be “cleaned” by re-applying the gates (i.e., the computation is doubled compared to the SP-classic case). Moreover, the oracle in the quantum circuit should be cleaned, meaning all the gates should be re-applied (Fig 1B).

Details of calculating the number of computations for each step of circuits in the SP and SP-classic models are provided in S5 Appendix. For the SP model, the number of computations in the initialization step is #ofQinit.O(log2(N)) gates. The oracle cost (#ofQorcl.) can be broken into the cost of introducing the energies, cost of adders, cost of subtracting the Etot from the Eth, and the negation cost. For introducing the energies, the total cost changes as O[(log2(log2(N))×(log2(N))2)]. Moreover, it is shown that the number of gates required in adder functions is O(m) [30,31], requiring O[log2(log2(N))×((log2(N))2] gates to compute the Etot. Furthermore, the cost of subtracting the Etot from the Eth is O[log2(log2(N))], and the negation cost is O(1). Thus, #ofQorcl.O[log2(log2(N))×{(log2(N))2+log2(N)+1}]. Finally, in the diffuser, the number of computations is proportional to O(log2(N)). Thus, using Eq 4 to find the total computation cost of computation, we have #ofQtotO[(log2(N))+N×{[log2(log2(N))×((log2(N))2+log2(N)+1)]+log2(N)}].

For the SP-classic model, we ignore the cost of introducing the energies to the classic circuits due to the lack of information on this part. Thus, we only consider the cost of adding the energy values to find the Etot and subtracting it from Eth to find the answer states. In this case, the cost of the computation for the SP-classic model from Eq 5 is #ofCtotO(N×[log2(log2(N))×(log2(N)+1)]).

These estimations show that for N>56, the number of computations in classic circuits is larger than the quantum algorithm, despite ignoring the cost of introducing the energy values to the classic circuit. Note that for the smallest case of s = 2, N is 64, indicating that the number of computations in classic form is higher than the quantum circuit for all systems. Similar is correct for the MR model, which is discussed in more detail in S5 Appendix.

2.5. Real devices and the effect of noise in simulators

The IBM-SP model circuit of our algorithms implemented on real quantum devices has four possible answer states, i.e., HH, HP, PH and PP. As expected from the energy table (Fig 3D), by setting the Eth = 0 and using the ideal noise-free QASM [32] simulator, the circuit finds the HH state as the answer (Fig 7A).

Fig 7. Histogram representations of probabilities for each state in the IBM-SP model circuit.

Fig 7

a) Results of the circuit, using the ideal MPS quantum computer simulator. b) Results of the circuit using: I) gate fidelity, measurement fidelity, initialization fidelity and qubit mapping of ibmq_toronto in the simulator; II) Real ibm_toronto device averaged over 20 different runs. c) Results of the circuit using: I) gate fidelity, measurement fidelity, initialization fidelity and qubit mapping of ibmq_montreal in the simulator; II) Real ibm_montreal device averaged over 30 different runs. In b) and c), the data bars with different colours show results for five separate runs. The error bars represent the standard deviation from the mean value. The numbers on each set of bars show the average probability of the state. The number of samplings for all plots is set to 8,192 shots.

In addition to the ideal simulations, we implement selective noise properties of the ibmq_toronto (Fig 7B-I) and the ibmq_montreal (Fig 7C-I) devices in the QASM simulator. Detail is provided in the Methods section. Implementing the gate, measurement and initialization fidelities, as well as the qubit connection mapping properties of the ibmq_toronto and the ibmq_montreal devices show that the simulations predict the expected answer state with the average probabilities of ~53.7% and ~33.5%, respectively. These results show that by only considering the selected noise sources, the real quantum devices are expected to distinguish the answer state from the others. Moreover, since the ibmq_toronto and the ibmq_montreal have the same qubit connection mappings, the large difference between the predicted results in Fig 7B-I and c-I implies lower gate, measurement and initialization fidelities for the ibmq_toronto compared to the ibmq_montreal quantum computer. These results suggest that despite having lower quantum volume (QV) [33], the ibmq_toronto device is more likely to distinguish the answer state clearly.

As expected from the results of the noise-included simulations, running the circuit on the ibmq_toronto device provides the answer state with the highest average probability of ~28.8%, while the ibmq_montreal device provides almost the same probabilities for all four states (Fig 7B-II and 7C-II). The probability of finding the HH state using the ibmq_toronto device is 28.77±2.39%, while for the second probable state, i.e., HP, this value is 24.94±1.26. These results indicate that the HH state has a higher probability considering the standard deviations. Note that despite having a higher probability, the probability of finding the answer states is only ~28.8%, while the probability of finding non-answer states is ~71.2%.

The difference between the results of the simulations considering the gates fidelities and the qubit mappings (Fig 7B-I and 7C-I) and the real quantum devices (Fig 7B-II and 7C-II) show the role of the performance parameters, such as coherence, crosstalk and spectator errors on the probability of the states. In the case of the ibmq_toronto, these performance parameters cause a significant loss in the probability of finding the answer state. These results are in agreement with the expected results of quantum devices in the NISQ era. As Chia et al. [34] and Li et al. [35] have discussed, circuits that could run on currently available real quantum devices are mostly limited to the circuit depth of only a few tens of gates, with a maximum of ~55. These studies suggest that the depth of our circuit transpiled on real quantum devices (i.e., ~160) is much larger than the current limit of the NISQ era quantum computers. Thus, even though the noise-included simulators predict the answer states with a high probability (Fig 7B-I and 7C-I), the “large” circuit depth hinders the answer state on real quantum devices (Fig 7B-II and 7C-II).

3. Discussions

This work studies developing gate-based circuits to address protein design problems by implementing a pure quantum computing algorithm, i.e., Grover’s algorithm. Using ideal quantum computer simulators shows that our quantum circuits can find the M desired answer states among N total states for systems with different complexities. Furthermore, the results confirm that using the maximum number of iterations (O(N/M)) provides the maximum probability of finding the answer states (~100%), indicating the quadratic advantage of our approach over classical methods on conventional computers in search for answer states.

Moreover, the maximum number of computations required in our quantum circuits is smaller than the least number of computations required for classical models, confirming the quantum advantage of our circuits over conventional methods. Furthermore, our results show that the MPS simulator can correctly simulate highly entangled algorithms such as Grover’s algorithm, with a large number of qubits in the circuit (up to 234 qubits).

In this work, the simplified model, i.e., the SP model, implements protein configurations similar to the 2D square lattice model, using integer numbers for calculating the energies. However, due to the limitation of the CPU times and computational resources, the largest circuits are limited to systems with six designable sites, simulating a simple hair-pin protein structure (Fig 2C-I) or complex and compact intra-protein configurations (Fig 2C-II and 2C-III). Adding more complexity to the system in the MR model, i.e., introducing the distances reciprocals and decimal numbers, limits the number of simulatable designable sites in a system to three. However, this model enables us to use more realistic energy terms in the Hamiltonian of a protein design problem on quantum computers, i.e., mimicking the Coulomb energies. Nonetheless, all different types of potentials used in conventional protein design methods could similarly be implemented in our algorithm. Moreover, our approach in implementing the pre-calculated distances reciprocals and decimal numbers and using multiplication functions could also be employed in the protein folding studies with quantum computers currently limited to uniform lattice models [1114].

At the current NISQ stage, the number of available qubits, their connectivity and the noise associated with using each gate limit studying pure quantum algorithms on real quantum devices. In these devices, since the depth of circuits is restricted to a few tens of gates, even running a small protein design circuit (i.e., IBM-SP) does not provide a definitive result due to the noise and requires further improvements in quantum computers. Moreover, for the SP,s = 2 system, the depth of the circuit in a fully connected simulator is ~400 gates, while after transpilation, the depth of the circuit on real quantum devices could reach ~17,000 gates. As an alternative approach, it would be of interest to use popular hybrid approaches in the NISQ era, e.g., the QAOA method, to study the protein design problem and compare results with those provided by circuits using Grover’s algorithm in the simulations. Nevertheless, improving the quantum devices at the current pace encourages us that in the near future, our circuits can be implemented on real quantum computers and show the advantage of quantum computers in protein design problems.

4. Methods

System setup & simulations

Following the Grover’s algorithm in Fig 1, after the initialization, the oracle is programmed to implement energies and conduct required calculations to find and mark the answer states. The oracle has different sub-steps associated with it. First, the classically-calculated values in the energy tables (Fig 3) are introduced to the oracle, based on the protein structure and the interaction pattern (Fig 2). Implementing the pair-wise energies in the oracle is described in the S6 Appendix. Next, the energy values for every pair-wise interaction in the structure are summed using quantum computing adders to find the total energy, Etot, for each sequence of residues (i.e., each state). This part of the oracle is the only part of the algorithm with different setups for the SP and the MR models, which is discussed in detail later in this section. Then, the Etot is subtracted from a threshold energy value of the circuit for each state, and the ones with the negative result are the answer. The Eth is explicitly set for each circuit, and by changing its value, the answer states change. Details on how the Etot and Eth are calculated for the SP and MR models will be discussed later in this section. Finally, the oracle negates the amplitude of the answer states. After marking the answer states (i.e., the negation), the oracle un-computes all previous steps (except for the negation step) to clear the work qubits and prevent them from affecting the final results [36].

In the SP model, the oracle calculates the total energy of the state k (out of N states) using:

EtotSP(k)=a>bEa,b(k),

where Ea,b(k) is the energy value of the interaction between designable sites a and b in the structure (Fig 2A–2C), while specific residues in the set k fill these sites (Fig 3A). Here, since there are no distance dependencies, all pair-wise interactions contribute equally to the total energy of the system. Moreover, the lowest Eth value in the SP model is defined as:

EthSP(k)=(Emin×i)+1 (6)

This Eth−SP provides the lowest number of answer states for each circuit in the SP model (and similarly in the IBM-SP model).

In the MR model, the total energy of the state k in the oracle is calculated using:

EtotMR(k)=a>bEa,b(k)×da,b1,

where da,b1 is a dimensionless matrix representing the distance reciprocal between designable sites a and b. Thus, nearer designable sites have more contributions to the Etot−MR. In the MR model, the Eth is defined as:

EthMR(k)=B×(Emin×a>bda,b1) (7)

where B is a unitless constant decimal number, less than 1. For simplicity, if B is set to 0.95 the Eth−MR is referred to as Eth−MR = 95%Emin in this paper. Note that in this work, the Eth−SP and Eth−MR can be distinguished by their values, i.e., being equal to an integer number and being a percentage of the Emin, respectively. Thus, in the paper, we refer to both as Eth, removing the “SP” and “MR” subscripts.

Moreover, it should be noted that the Eth value calculated for the SP model system (Eq 6) captures the lowest energy structures for different numbers of designable sites (see results in S2 Appendix). However, the Eth value calculated for the MR model (Eq 7) does show the limit below which there are no answer states (for B = 1). Due to the presence of distance reciprocals and the fact that all designable sites interact in the MR model circuits, one should carefully set the B value to attain the desired results. As the number of designable sites increases, the B value and, equivalently, the Eth value to find the state with the lowest energy decreases (see results in S2 Appendix for the MR,s = 2 and MR,s = 3 systems). Unlike the SP model, the Eth in the MR model does not define the lowest energy states but a heuristic threshold from an empirical experience to find low energy states.

In the oracle, a version of the quantum ripple-carry adder introduced by Cuccaro et al. [30] is used for adding (and subtracting) the values for both SP and MR models, which requires 2m+1 qubits to add numbers, each represented with m qubits. This adder is also implemented as a part of the multiplication function employed in the MR model, using 16m+1 qubits to multiply the two numbers.

The role of Grover’s diffuser (Fig 1) is to act on the n qubits and increase the probability of answer states over all the other states in the circuit. To accomplish this, the diffuser changes the negative amplitude of the answer states (marked in the oracle) to positive and then increases the amplitude of these flipped states [3]. Note that since the total probability of all states is one, increasing the amplitude of the answer states (and thus the probability of finding them) decreases the amplitude of the non-answer states.

The final step of the algorithm is the measurement (Fig 1), which is done on all n qubits to find the M answer states among all N possible states. Note that the work qubits are not measured and are discarded.

In Grover’s algorithm, the upper bound of the number of iterations required to get the answer states with the highest probability is:

Rmax12π2×arcsin(MN)+O(MN),

while, in the NM limit, the equation changes to:

Rmaxπ4NM (8)

We use IBM’s Qiskit package [37] to generate the circuits and simulate them in this study. Better known simulators such as the QASM require large amounts of RAM for the simulations, scaling as 16×2q, requiring at least 128 GB of RAM for the largest SP model circuit with q = 33 (Table 1). However, for circuits using more than 150 qubits in the MR model, this number reaches ∼ 2×1034 TB of RAM. Thus, due to the unprecedented size of our systems, we use the MPS simulators as the only possible choice to simulate all circuits. We use the computational resources provided by the Cedar cluster [38] to run our circuits on the quantum computer simulator.

Real quantum devices and noise-containing simulators

The quantum circuits in this work are composed of several single-qubit and multi-qubit (including two or more qubits) gates. Generally, several single-qubit and multi-qubit gates with low complexities are predefined in quantum computer simulators (details vary by simulator packages). However, in the case of more complex gates such as the CC–NOT (Toffoli) gate, the simulators decompose them into the simpler predefined gates.

Unlike simulators, real quantum devices only support a handful of native gates. Therefore, all gates are decomposed into the native gates when simulating our circuits on real quantum computers. However, depending on the type of the device and its manufacturer, these native gates may vary.

In addition to supporting only a few native gates, using the real quantum devices has further limitations. Since we are currently in the NISQ era, the qubits and gates contain noise while transferring data. Currently, IBM quantum devices have error rates of O(102) and O(104) for the C-NOT and single-qubit gates, respectively [39]. Moreover, IBM quantum devices have limited connectivity between qubits [39]. Thus, swap gates are required to perform two-qubit gates between not directly connected qubits, increasing the number of gates and the quantum computational cost of the circuit.

We use the circuit depth metric to measure and compare the complexity of circuits in this study. The circuit depth represents the number of gates in the longest path in a circuit [40], i.e., from the initialization to the measurement in our systems (Fig 1). For the circuit of two designable sites (Fig 1A) at R = 1 in the SP model, the simplest system studied on quantum computer simulators in our study, the circuit depth in the decomposed stage on an ideal device with fully connected qubits is ~17,000 (more discussion is provided in S7 Appendix). In the current NISQ era, it is impossible to run a circuit with such a “large” depth on a real device due to the noise introduced to the results, owing to the gate fidelities and the system decoherence for deeper circuits [25]. Note that the depth of the circuit will increase significantly if the qubits are not fully connected on a quantum computer.

The IBM-SP model, executed on real quantum devices, requires seven qubits in total with n = 2 (Table 1). By setting the Eth to 0, the circuit will have one answer state (Fig 3D) and the Rmax = 1. We employ two IBM quantum computers, the ibmq_toronto (v1.6.1 and v1.6.2) with the IBM Quantum Falcon r4 processors and the ibmq_montreal (v1.10.11) with the IBM Quantum Falcon r4 processors, both having 27 qubits and the same pattern of connectivity. To run the IBM-SP model on these devices, we use optimization level 2 and a selective pattern of qubits to transpile the ideal circuit on them. Moreover, we apply the transpilation 500 times for each run and select the circuit with the lowest depth, ranging between 158 and 167, as an input for the real quantum devices. To compare these quantum computers, we use the quantum volume, a unitless number, quantifying the largest random circuit that a quantum computer can implement successfully [33]; thus, the more, the better.

Supporting information

S1 Appendix. Number of required qubits in the circuit and the general algorithm.

(PDF)

S2 Appendix. Explicit results of each system.

(PDF)

S3 Appendix. Computational costs of the simulations on conventional computers.

(PDF)

S4 Appendix. Role of number of iterations.

(PDF)

S5 Appendix. Computational costs in the circuits.

(PDF)

S6 Appendix. Implementing energies in the circuit.

(PDF)

S7 Appendix. Computational Cost of Simulations and Circuit Depth.

(PDF)

Acknowledgments

The authors would like to thank WestGrid (www.westgrid.ca) and Compute Canada (www.computecanada.ca) for providing computational resources for this project.

We acknowledge the use of IBM Quantum services for this work. The views expressed are those of the authors, and do not reflect the official policy or position of IBM or the IBM Quantum team.

We also would like to acknowledge CMC Microsystems for facilitating this research, specifically through their member access to the IBM Quantum Hub at Institut quantique.

Data Availability

All relevant data are within the paper and its Supporting Information file. The codes to create the circuits in the MR and SP models in this study are available on Zenodo at https://zenodo.org/record/7344649 and are available on GitHub at https://github.com/Mohammad-Khatami/grover-protein-desing.

Funding Statement

PMK received a Canadian Institutes of Health Research (CIHR) grant PJT-159750, which supported this project. (https://cihr-irsc.gc.ca) MHK's salary was partially funded by Canadian Institutes of Health Research (CIHR) grant PJT-159750. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Pierce NA, Winfree E. Protein design is NP-hard. Protein Eng. 2002;15: 779–782. doi: 10.1093/protein/15.10.779 [DOI] [PubMed] [Google Scholar]
  • 2.Knuth DE. Postscript about NP-hard problems. ACM SIGACT News. 1974;6: 15–16. [Google Scholar]
  • 3.Nielsen MA, Chuang I. Quantum computation and quantum information. Cambridge: University Press; 2010. [Google Scholar]
  • 4.Dill KA. Theory for the folding and stability of globular proteins. Biochemistry. 1985;24: 1501–1509. doi: 10.1021/bi00327a032 [DOI] [PubMed] [Google Scholar]
  • 5.Lathrop RH. The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Engineering, Design and Selection. 1994;7: 1059–1068. doi: 10.1093/protein/7.9.1059 [DOI] [PubMed] [Google Scholar]
  • 6.Gilks WR, Richardson S, Spiegelhalter D. Markov chain Monte Carlo in practice. CRC press; 1995. [Google Scholar]
  • 7.Gamerman D, Lopes HF. Markov chain Monte Carlo: stochastic simulation for Bayesian inference. CRC Press; 2006. [Google Scholar]
  • 8.Thomas J, Ramakrishnan N, Bailey-Kellogg C. Protein design by sampling an undirected graphical model of residue constraints. IEEE/ACM Trans Comput Biol Bioinform. 2008;6: 506–516. [DOI] [PubMed] [Google Scholar]
  • 9.Sun MGF, Kim PM. Data driven flexible backbone protein design. PLoS Comput Biol. 2017;13: e1005722. doi: 10.1371/journal.pcbi.1005722 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Anishchenko I, Chidyausiku TM, Ovchinnikov S, Pellock SJ, Baker D. De novo protein design by deep network hallucination. bioRxiv. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Perdomo-Ortiz A, Dickson N, Drew-Brook M, Rose G, Aspuru-Guzik A. Finding low-energy conformations of lattice protein models by quantum annealing. Sci Rep. 2012;2: 1–7. doi: 10.1038/srep00571 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Babej T, Fingerhuth M, others. Coarse-grained lattice protein folding on a quantum annealer. arXiv preprint arXiv:181100713. 2018. [Google Scholar]
  • 13.Robert A, Barkoutsos PK, Woerner S, Tavernelli I. Resource-efficient quantum algorithm for protein folding. npj Quantum Inf. 2021;7: 1–5. [Google Scholar]
  • 14.Fingerhuth M, Babej T, others. A quantum alternating operator ansatz with hard and soft constraints for lattice protein folding. arXiv preprint arXiv:181013411. 2018. [Google Scholar]
  • 15.Farhi E, Goldstone J, Gutmann S. A quantum approximate optimization algorithm. arXiv preprint arXiv:14114028. 2014. [Google Scholar]
  • 16.Peruzzo A, McClean J, Shadbolt P, Yung M-H, Zhou X-Q, Love PJ, et al. A variational eigenvalue solver on a photonic quantum processor. Nat Commun. 2014;5: 1–7. doi: 10.1038/ncomms5213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ruiz A de la F. Quantum annealing. arXiv; 2014. doi: 10.48550/ARXIV.1404.2465 [DOI] [Google Scholar]
  • 18.Mulligan VK, Melo H, Merritt HI, Slocum S, Weitzner BD, Watkins AM, et al. Designing peptides on a quantum computer. bioRxiv. 2020; 752485. [Google Scholar]
  • 19.Renfrew PD, Choi EJ, Bonneau R, Kuhlman B. Incorporation of noncanonical amino acids into Rosetta and use in computational protein-peptide interface design. PLoS One. 2012;7: e32637. doi: 10.1371/journal.pone.0032637 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Grover LK. A fast quantum mechanical algorithm for database search. Proceedings of the twenty-eighth annual ACM symposium on Theory of computing. 1996; 212–219.
  • 21.Walther P, Resch KJ, Rudolph T, Schenck E, Weinfurter H, Vedral V, et al. Experimental one-way quantum computing. Nature. 2005;434: 169–176. doi: 10.1038/nature03347 [DOI] [PubMed] [Google Scholar]
  • 22.Durr C, Hoyer P. A quantum algorithm for finding the minimum. arXiv preprint quant-ph/9607014. 1996. [Google Scholar]
  • 23.Shor PW. Algorithms for quantum computation: discrete logarithms and factoring. Proceedings 35th annual symposium on foundations of computer science. 1994. pp. 124–134. [Google Scholar]
  • 24.Deutsch D, Jozsa R. Rapid solution of problems by quantum computation. Proc R Soc Lond A Math Phys Sci. 1992;439: 553–558. [Google Scholar]
  • 25.Saki AA, Alam M, Ghosh S. Study of Decoherence in Quantum Computers: A Circuit-Design Perspective. arXiv preprint arXiv:190404323. 2019. [Google Scholar]
  • 26.Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, et al. The Rosetta all-atom energy function for macromolecular modeling and design. J Chem Theory Comput. 2017;13: 3031–3048. doi: 10.1021/acs.jctc.7b00125 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bhaskar MK, Hadfield S, Papageorgiou A, Petras I. Quantum algorithms and circuits for scientific computing. arXiv preprint arXiv:151108253. 2015. [Google Scholar]
  • 28.Vidal G. Efficient classical simulation of slightly entangled quantum computations. Phys Rev Lett. 2003;91: 147902. doi: 10.1103/PhysRevLett.91.147902 [DOI] [PubMed] [Google Scholar]
  • 29.Botsinis P, Babar Z, Alanis D, Chandra D, Nguyen H, Ng SX, et al. Quantum error correction protects quantum search algorithms against decoherence. Sci Rep. 2016;6: 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Cuccaro SA, Draper TG, Kutin SA, Moulton DP. A new quantum ripple-carry addition circuit. arXiv preprint quant-ph/0410184. 2004. [Google Scholar]
  • 31.Gidney C. Halving the cost of quantum addition. Quantum. 2018;2: 74. doi: 10.22331/q-2018-06-18-74 [DOI] [Google Scholar]
  • 32.Svore KM, Aho A v, Cross AW, Chuang I, Markov IL A layered software architecture for quantum computing design tools. Computer (Long Beach Calif). 2006;39: 74–83. [Google Scholar]
  • 33.Cross AW, Bishop LS, Sheldon S, Nation PD, Gambetta JM. Validating quantum computers using randomized model circuits. Phys Rev A (Coll Park). 2019;100: 032328. [Google Scholar]
  • 34.Chia N-H, Chung K-M, Lai C-Y. On the need for large quantum depth. Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. 2020. pp. 902–915.
  • 35.Li R, Wu B, Ying M, Sun X, Yang G. Quantum Supremacy Circuit Simulation on Sunway TaihuLight. 2018. [Google Scholar]
  • 36.Aaronson S. Quantum lower bound for recursive fourier sampling. arXiv preprint quant-ph/0209060. 2002. [Google Scholar]
  • 37.Abraham H, AduOffei, Agarwal R, Akhalwaya IY, Aleksandrowicz G, Alexander T, et al. Qiskit: An Open-source Framework for Quantum Computing. 2019. [Google Scholar]
  • 38.Cedar cluster: https://docs.computecanada.ca/wiki/Cedar (accessed: 06.02.2022).
  • 39.Ibm real quantum devices: https://quantum-computing.ibm.com/services?services=systems (last accessed: 06.02.2022).
  • 40.Sipser M. Introduction to the Theory of Computation. Cengage learning; 2012. [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011033.r001

Decision Letter 0

Nir Ben-Tal, Roman Krems

8 Mar 2023

Dear Dr. Kim and Co-authors,

Thank you very much for submitting your manuscript "Gate-based quantum computing for protein design" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Please revise the manuscript to address the comments of both referees and resubmit.

We request that you make your code for building the quantum circuits for protein design in this paper publicly available. Please also make publicly available any other code or information that will allow other researchers to reproduce your results.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Roman Krems

Guest Editor

PLOS Computational Biology

Nir Ben-Tal

Section Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

Please revise the manuscript to address the comments of both referees and resubmit.

We request that you make your code for building the quantum circuits for protein design in this paper publicly available. Please also make publicly available any other code or information that will allow other researchers to reproduce your results.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Comment by Section Editor:

My apology for the very long time that it took us to handle your manuscript. The topic is outside my own expertise and also far from the rest of our editorial board. Thus, I had to look for a guest editor, which took long, presumably because I was outside my comfort zone. 

Reviewer #1: See PDF attachment. (The text below duplicates the PDF.)

I have read the submission by Khatami, Mendes, Wiebe, and Kim titled “Gate-based quantum computing for protein design”. In this manuscript, the authors examine the optimization problem of finding an amino acid sequence corresponding to the minimum value of an energy or scoring function, a problem that is solved during the protein design process. This problem scales poorly with the number of amino acid positions being designed in the protein (which the authors represent as s), or with the number of amino acid types considered at each position (which the authors represent as A), and this represents a serious challenge for design algorithms on classical computers. The authors show that this problem can be solved on gate-based quantum computers using Grover’s algorithm, a method for searching a large solution space using quadratically fewer samples than would be required classically. Using quantum simulators run on classical hardware, they show that they are able to produce solution states with high probability (quantum computing is probabilistic computing) for small problems, and that noise hinders but does not eliminate the functionality of their approach. They also show enrichment of the solution state for a very small problem on an actual, current-generation IBM gate-based quantum computer.

The authors have identified a sensible problem for which quantum computers could plausibly offer an advantage over classical computers one day. Their approach appears to be sound. The manuscript reads well, like one that has had quite a bit of thought go into revisions already. I am largely satisfied that this manuscript is suitable for publication. Below, I list one concern that I hope the authors can address or rebut, as well as a small number of minor points related to presentation which I think the authors will be able to address easily.

Major concern:

1. The PLoS author guidelines state: “PLOS expects researchers to share software and scripts needed for the work. If this cannot be made publicly available (e.g. due to licenses), the simulation method should be provided in sufficient detail so the results can, in principle, be reproduced using publicly available software.” I think that the authors have made a good faith effort to describe their methods in a good deal of detail. I am moderately confident that a person with sufficient domain expertise could reproduce what was done here, though the complexity of the approach does mean that it would be hard to say for certain whether any differences that that person saw between their results and the authors’ results was due to errors in reimplementing the method versus actual failure of the method to reproduce the result. I would be more comfortable if the authors were willing to make their Qiskit code for simulating these circuits (or, at least, for simulating a particular problem) publicly available. I won’t insist on this, since the description in the supplement is quite detailed, but it would make me more comfortable for the sake of full reproducibility.

Minor points:

1. On lines 70-72, the statement of the main characteristic of NP-complete problems could be a bit more precise and a bit clearer. The main characteristic of these problems is that the time or resources needed to find a solution scales poorly, not that they are intrinsically costly across all problem sizes. (There are NP-complete problems, like the travelling salesman problem, that are solved routinely for small problem instances, such as getting delivery vehicles to customers.) Similarly, on line 73, the proposed answer need not be scored easily, but in polynomial time. (There are polynomial-time tasks, such as the two-body molecular docking problem, that are extremely costly.) I mention this mainly because this is often a misconception that I have to dispel with students, that NP-completeness intrinsically means computational intractability or that polynomial-time scaling intrinsically means computational tractability for all problem sizes. It’s actually something that’s kind of interesting about the protein design problem: it’s an NP-complete problem with interesting real-world applications across a range of scales, ranging from extremely tractable (allowing classical and quantum solvers to be compared on these problems) to challenging (meaning that there may be problems for which there’s a slight quantum advantage) to utterly intractable (meaning that a good quantum solver would have problems that it was uniquely suited to do).

2. In the caption to Figure 1, there’s a sentence that seems to be incomplete. (As an aside, I like the detail in figure 1 – particularly panel B. Explanations like this in plan language make these algorithms much easier to understand.)

3. On lines 137-138, “is not advantageous” is a little bit confusing. Maybe it could be made clearer that the sign flip does not, at this point, alter the relative probabilities of states (something that will be altered by the subsequent diffuser step)?

4. Figure 2 is pretty clear. A suggestion that the authors might consider is adding a dashed line for each pairwise interaction, maybe in the same dark red colour as the listed interactions. (The lists currently refer to a feature that the reader can infer, but which is invisible. I wonder if it might be a little easier on the reader just to make it more visible.)

5. On lines 194 and 201, “number of residues” is a bit ambiguous: it could be the number of designable positions, s, or the number of amino acid possibilities at each position, A. I think the latter is meant. Perhaps using the nomenclature established earlier (A) or indicating that this is number of residue types might make this clearer.

6. A small singular/plural typo on line 228: “simulate circuits with this many qubits.”

7. Figures 5 and 6 show good results. The one minor criticism I have is of the x-axis label in figure 5 and the x- and y-axis labels in figure 6: descriptive labels such as “Max number of iterations (Rmax)” are easier to follow than the symbol alone (especially when the axis label is otherwise an expression of several symbols, such as R/Rmax. It’s not always intuitive to a reader why a particular expression is on an axis or what the expected relationship between an abstract expression on one axis and an abstract expression on another should be.)

8. It’s worth being aware that a circuit depth of 20 (line 415) sounds a little bit low as an upper limit. My experience has been that circuit depth limits of 40 or 50 seem a little more realistic, currently. This is very minor, though: either way, the authors’ point that only very short circuits are currently possible is a valid one.

9. On line 444, “Coulomb” should be capitalized.

10. I’d be a little bit careful with the claim on lines 446-448, that the distance reciprocal and decimal number approach used here could be used in lattice model folding studies. If the claim is that one could keep the lattice model, but use pairwise interaction energies represented as decimal numbers, possibly taken from classically precomputed lookup tables, this sounds reasonable. If the claim is that this provides a route to non-integer positions for the beads on the chain (i.e. a means of moving past lattice models to continuous-space models), this is harder to envisage, and would require means of dealing with very non-pairwise effects (such as the fact that the position of residue 3 depends on both residue 1 and residue 2). Maybe rephrasing this to make it clear that the former is meant would be a good idea.

11. In the first paragraph of the methods, it might be good to make it even more explicit what’s done classically and what’s done in the quantum circuit. For instance, “First, the classically-computed values in the energy tables are introduced to the oracle… Next the energy values for every pair-wise interaction in the structure are summed in the quantum circuit to find the total energy…”

12. The point made on lines 549-555, about the circuit depth needed for a real problem compared to what’s currently possible, is an important one, and one that many readers will be looking for in the conclusions. It’s likely worth including in the discussion in more detail, not just in the methods. I suspect that many readers will be looking for this as a bottom line to the story: they’ll be wondering, “at what point is the quantum hardware likely to be useful for real design problems”? (Note: I do NOT consider the fact that real problems will require larger, more robust hardware than is currently available to be a shortcoming of this work.)

Reviewer #2: Summary:

This paper proposes to use Grover’s algorithm to solve the protein design problem, which asks to decide the amino acid type on each designable site to achieve the lowest energy. In contrast to prior gate-based hybrid quantum-classical computation and quantum annealing, this work presents a gate-based pure quantum algorithm solution. The main task in this work is to construct the oracle in Grover’s algorithm. With the oracle, Grover’s algorithm can be applied to find the desired solution. It shows that the expected quadratic speed-up can be achieved compared to classical brute-force search. Experimental justification is performed on simulators and real quantum devices to show the correctness of the method.

Strengths:

- The protein design problem is important and its effective solutions will be widely useful.

- The first gate-based pure quantum algorithm for the protein design problem is proposed.

- The method is flexible in customizing for different energy tables and protein structure models.

- Related work is adequately cited.

Weaknesses:

- The design of the oracle is somewhat brute-forced, leading to extremely high costs.

- The oracle construction, which is the main technicality of the paper, is not well described.

- There is no experimental comparison with prior methods.

Specific comments:

Grover’s algorithm is a well-known quantum algorithm. Therefore, the main contribution of this work is designing the oracle to make the algorithm applicable to the considered problem of protein design. However, the article does not focus on explaining the oracle construction, but rather spends too much effort introducing Grover’s algorithm. Moreover, the oracle seems to directly use existing structures of adders and multipliers, trying to do numerical operations directly on quantum circuits. This direct adoption without optimization leads to extremely high costs, limiting the problem to small examples (e.g., 6 designable sites) even for simulation-based justification. It would be helpful if optimization can be performed to minimize the oracle circuit.

The value of E_th may seem a heuristic coming from an empirical experience. Although Eq. (6) and Eq. (7) may result in a good performance (providing few answer states) in the tested benchmarks, their applicability and generality may be questionable. For example, if there are more designable sites, maybe 95% E_min may be unattainable for any candidate states, leading to empty solution space. It would be good to discuss how to set E_th. Moreover, it seems that this method only finds solutions with lower energy rather than the lowest energy.

Although related work is introduced, a more in-depth comparison should be performed to reveal the strengths and weaknesses of different approaches.

**********

Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No: See my first point in the PDF of my review. The authors have done a pretty good job of describing their methods, particularly in the supplement. Reproducing these methods based on the description would be possible, but it would be easy to make errors in the re-implementation. This makes difficult comparison of one's own results with the re-implemented method to the original method described here. I'd be more comfortable if Qiskit code implementing these quantum circuits were made publicly available.

Reviewer #2: No: It would be nice if the authors can make the quantum circuits of protein design in this paper publically available.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Vikram K. Mulligan

Reviewer #2: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript.

If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Attachment

Submitted filename: Review_2022_02_21.pdf

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011033.r003

Decision Letter 1

Nir Ben-Tal, Roman Krems

17 Mar 2023

Dear Professor Kim,

We are pleased to inform you that your manuscript 'Gate-based quantum computing for protein design' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Roman Krems

Guest Editor

PLOS Computational Biology

Nir Ben-Tal

Session Editor

PLOS Computational Biology

***********************************************************

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1011033.r004

Acceptance letter

Nir Ben-Tal, Roman Krems

5 Apr 2023

PCOMPBIOL-D-22-01706R1

Gate-based quantum computing for protein design

Dear Dr Kim,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Anita Estes

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix. Number of required qubits in the circuit and the general algorithm.

    (PDF)

    S2 Appendix. Explicit results of each system.

    (PDF)

    S3 Appendix. Computational costs of the simulations on conventional computers.

    (PDF)

    S4 Appendix. Role of number of iterations.

    (PDF)

    S5 Appendix. Computational costs in the circuits.

    (PDF)

    S6 Appendix. Implementing energies in the circuit.

    (PDF)

    S7 Appendix. Computational Cost of Simulations and Circuit Depth.

    (PDF)

    Attachment

    Submitted filename: Review_2022_02_21.pdf

    Attachment

    Submitted filename: QC_Paper_answer.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information file. The codes to create the circuits in the MR and SP models in this study are available on Zenodo at https://zenodo.org/record/7344649 and are available on GitHub at https://github.com/Mohammad-Khatami/grover-protein-desing.


    Articles from PLOS Computational Biology are provided here courtesy of PLOS

    RESOURCES