Experimental semi-autonomous eigensolver using reinforcement learning

C-Y Pan; M Hao; N Barraza; E Solano; F Albarrán-Arriagada

doi:10.1038/s41598-021-90534-7

. 2021 Jun 10;11:12241. doi: 10.1038/s41598-021-90534-7

Experimental semi-autonomous eigensolver using reinforcement learning

C-Y Pan ¹, M Hao ¹, N Barraza ¹, E Solano ^1,^2,^3,^4,^✉, F Albarrán-Arriagada ^1,^✉

PMCID: PMC8192530 PMID: 34112819

Abstract

The characterization of observables, expressed via Hermitian operators, is a crucial task in quantum mechanics. For this reason, an eigensolver is a fundamental algorithm for any quantum technology. In this work, we implement a semi-autonomous algorithm to obtain an approximation of the eigenvectors of an arbitrary Hermitian operator using the IBM quantum computer. To this end, we only use single-shot measurements and pseudo-random changes handled by a feedback loop, reducing the number of measures in the system. Due to the classical feedback loop, this algorithm can be cast into the reinforcement learning paradigm. Using this algorithm, for a single-qubit observable, we obtain both eigenvectors with fidelities over 0.97 with around 200 single-shot measurements. For two-qubits observables, we get fidelities over 0.91 with around 1500 single-shot measurements for the four eigenvectors, which is a comparatively low resource demand, suitable for current devices. This work is useful to the development of quantum devices able to decide with partial information, which helps to implement future technologies in quantum artificial intelligence.

Subject terms: Quantum information, Quantum simulation

Introduction

Increasing the computational capabilities of machines is an essential field in artificial intelligence. In this context, machine learning algorithms have emerged with great force in the last decades^1,2. This class of algorithms can be divided into two families, learning from big data and learning from interactions. Learning from big data can be classified into two categories, supervised and unsupervised learning. In the supervised learning paradigm, we have a set of labeled data named training data, from which we want to infer some classification function to sort unlabeled new data. Unsupervised learning algorithms do not use training data. In this paradigm, the goal is to extract the statistical structure of an unsorted data set and divide it into different groups according to some criteria (clustering problem)^3–8.

In the category of learning from interactions we have the Reinforcement Learning (RL) algorithms^9–18. The idea in this paradigm is that a known and manipulable system called agent (A) interacts with a non-manipulable system called environment (E). Here, the goal is to optimize a task $G (A, E)$ , which depends on the state of A and E. For this, we use feedback loops to change the state of A using the information extracted from the interaction with E. Some impressive and recent examples of RL are the AI players for different strategy games like Go¹⁹, Chess²⁰, or StarCraft II²¹.

On the other hand, it has been shown that quantum computing²² can overcome some fundamental limits of classical computing, e.g., in searching problems²³, factorization algorithms²⁴, solving linear equation systems^25,26, and for linear differential equations²⁷. Therefore, it was natural to merge machine learning techniques with the advantages of quantum computing in the topic known as Quantum Machine Learning (QML)^28–35.

With the development of Noisy Intermediate-Scale Quantum (NISQ) devices³⁶, the research on simple quantum information protocol (suitable for NISQ quantum computers) and in QML has grown in the last years. The IBM quantum computer is one of the most famous open NISQ devices, which can be programmed using Qiskit³⁷, an open-source python package, to create and run quantum programs using the IBM quantum cloud service³⁸.

One of the most useful algorithms for linear algebra, and hence for quantum mechanics, are the quantum eigensolvers. The hybrid quantum-classical algorithms like variational quantum eigensolver (VQE)^39–41 take advantage due to its easy implementation in NISQ devices. The main idea of this class of algorithm is to calculate some expectation value (like energy) with a quantum processor, and then use a classical optimizer (like variational one) to reach the solution⁴². Nevertheless, it has been recently proposed an algorithm that uses a quantum optimizer⁴³. Each iteration of the classical optimizer algorithm involves many single-shot measurements in the quantum system, which are required to calculate an expectation value. The development of an algorithm with more quantum features will involve the use of a more primitive classical subroutine.

In this paper, we implement the semi-autonomous eigensolver proposed in Ref.⁴⁴. The protocol can obtain an approximation of all eigenvectors for an arbitrary observable using single-shot measurements instead of expectation values. Here, we use the most basic classical subroutine, which involves only pseudo-random changes handled by the outcome of the single-shot measurement and a feedback loop. Due to this feedback loop, this algorithm can be classified in the RL paradigm. Using our protocol, we can obtain a high fidelity approximation for all eigenvectors. In the single-qubit case, we get fidelities larger than 0.97 and larger than 0.91 for a two-qubit observable in around 200 and 5000 single-shot measurements, respectively. This work opens the door to explore alternative paradigms in hybrid classical-quantum algorithms, which is useful for developing semi-autonomous quantum devices that decide with incomplete information.

Methods

Basics on RL paradigm

We briefly describe the basic components of the RL paradigm. As mentioned above, in an RL algorithm, we define two systems: the agent A and the environment E. The interaction among these systems can be divided in three basic steps, the policy, the reward function (RF) and the value function (VF). The policy refers to the general rules of the algorithm and can be subdivided into three stages: first, the interaction, where we specify how A and E interact; second, the action, which refers to how A changes its perception of E modifying some internal parameters; and third, the information extraction, that defines the process used by A to infer information from E. The information extraction can be done directly by A or using an auxiliary system, named register, if A cannot read the response of the environment.

The RF is the criterion to reward or punish A in each iteration using the information collected from E. This step is the most important in any RL algorithm because the right choice of the RF ensures the optimization of the desired task $G (A, E)$ . Finally, the VF evaluates a figure of merit related to the task $G (A, E)$ , which provides us the utility of the algorithm. The main difference between RF and VF is that the first evaluates each iteration to increase the performance locally in time without considering the history of the algorithm. At the same time, VF depends on the history of the algorithm, which takes into consideration a large number of iterations given the global performance of the algorithm.

RL protocol

We define the basic parts of our protocol as an RL algorithm. The state of the agent is denoted by

\begin{matrix} | A_{k}^{(j)} ⟩ = {\hat{D}}_{k} | j ⟩, \end{matrix}

where ${\hat{D}}_{k}$ is a unitary transformation to prepare the desired agent state, the state $| j ⟩$ is the initial state provided by the quantum processor in the computational basis, and the subindex k denotes the iteration of the algorithm. The environment is expressed as an unknown Hermitian operator $\hat{O}$ written as

\begin{matrix} \hat{O} = \sum_{j} α^{(j)} | E^{(j)} ⟩ ⟨ E^{(j)} |, \end{matrix}

with $α^{(j)}$ and $| E^{(j)} ⟩$ the jth eigenvalue and eigenvector of $\hat{O}$ , respectively. The task $G$ is set to maximize the fidelity between the state of the agent, $| A_{N}^{(j)} ⟩$ , after N iterations, and the eigenvectors $| E^{(j)} ⟩$ , or in other words, we want to find the matrix ${\hat{D}}_{k}$ that diagonalizes the observable $\hat{O}$ .

Now, the policy is as follows:

Interaction: The observable $\hat{O}$ generates an evolution given by the unitary transformation

\begin{matrix} \hat{E} = e^{- i \hat{O} τ} = \sum_{j} e^{- i α^{(j)} τ} | E^{(j)} ⟩ ⟨ E^{(j)} |, \end{matrix}

where $τ$ is a constant related with the elapsed time of the interaction. The agent state after this evolution is

\begin{matrix} \hat{E} | A_{k}^{(j)} ⟩ = | {\bar{A}}_{k}^{(j)} ⟩ = \sum_{ℓ} c^{(ℓ)} | A_{k}^{(ℓ)} ⟩ . \end{matrix}

Information extraction: We measure the state $| {\bar{A}}_{k}^{(j)} ⟩$ in the basis ${| A_{k}^{(ℓ)} ⟩}$ . For this purpose we apply the transformation ${\hat{D}}_{k}^{†}$ obtaining

\begin{matrix} {\hat{D}}_{k}^{†} | {\bar{A}}_{k}^{(j)} ⟩ = \sum_{ℓ} c^{(ℓ)} | ℓ ⟩, \end{matrix}

followed by a single-shot measurement in the computational basis ${| ℓ ⟩}$ obtaining the outcome value m with probability $| c^{(m)} |^{2}$ . This outcome refers to the resulting state $| A_{k}^{(m)} ⟩$ after the measuring process.

Action: According to Eq. (3) if $| A_{k}^{(j)} ⟩$ is equal to some eigenvector of $\hat{O}$ , we obtain $c^{(j)} = 1$ in Eq. (4). Using this condition we define the next rule for the action. If the outcome is $m \neq j \Rightarrow c^{(j)} \neq 1$ , then $| A_{k}^{(j)} ⟩$ is not an eigenvector of $\hat{O}$ . In this case ( $m \neq j$ ), we modify the agent for the next iteration defining operator ${\hat{D}}_{k + 1}$ as

\begin{matrix} {\hat{D}}_{k + 1} = {\hat{D}}_{k} {\hat{u}}_{j, m} (θ, ϕ, λ), \end{matrix}

with

\begin{matrix} {\hat{u}}_{j, m} (θ, ϕ, λ) = e^{- i λ {\hat{S}}_{j, m}^{(z)}} e^{- i θ {\hat{S}}_{j, m}^{(y)}} e^{- i ϕ {\hat{S}}_{j, m}^{(z)}}, \end{matrix}

where,

\begin{matrix} S_{j, m}^{(z)} = \frac{1}{2} (| j ⟩ ⟨ j | - | m ⟩ ⟨ m |), \\ S_{j, m}^{(y)} = - \frac{i}{2} (| j ⟩ ⟨ m | - | m ⟩ ⟨ j |) . \end{matrix}

Then,

\begin{matrix} {\hat{u}}_{j, m} (θ, ϕ, λ) = & cos (\frac{θ}{2}) (| j ⟩ ⟨ j | + e^{i (λ + ϕ)} | m ⟩ ⟨ m |) \\ + sin (\frac{θ}{2}) (- e^{i ϕ} | j ⟩ ⟨ m | + e^{i λ} | m ⟩ ⟨ j |) \end{matrix}

up to a global phase. Therefore, $\hat{u} (θ, ϕ, λ)$ is a general rotation in the ${| j ⟩, | m ⟩}$ subspace. The angles are random numbers given by

\begin{matrix} {θ, λ, ϕ} \in w_{k} \cdot [- π, π], \end{matrix}

where the range amplitude $w_{k}$ will be updated in each iteration according to the RF, which will be specified later. Now, for the case $m = j$ , the state $| A_{k}^{(j)} ⟩$ could be an eigenvector of $\hat{O}$ , then we define

\begin{matrix} {\hat{D}}_{k + 1} = {\hat{D}}_{k} . \end{matrix}

We can summarize Eqs. (6) and (11) as

\begin{matrix} {\hat{D}}_{k + 1} = {\hat{D}}_{k} [\sum_{l \neq j} {\hat{u}}_{l, m} (θ, ϕ, λ) \cdot δ_{l, m} + I \cdot δ_{j, m}] . \end{matrix}

Now, we define the reward function as

\begin{matrix} w_{k + 1} = w_{k} [p \cdot \sum_{l \neq j} δ_{l, m} + r \cdot δ_{j, m}] \end{matrix}

where $p > 1$ is the punishment ratio, and $0 < r < 1$ is the reward ratio. This means that each time we obtain the outcome $m \neq j$ , we increase the amplitude range $w_{k + 1}$ , because $m \neq j$ means that we are further away from an eigenvector and greater corrections are required. In the other case, when $m = j$ means that we are closer to an eigenvector, then, we reduce the value of $w_{k + 1}$ obtaining smaller changes for future iterations.

Finally, the value function will be the last value of the range amplitude $w_{N}$ after N iterations. If $w_{N} \to 0$ signifies that we have measured $m = j$ several times, then $c^{(j)} \approx 1$ , which implies that we obtain a good approximation of an eigenvector.

Results

Single-qubit case

We implement the algorithm described above in the IBM quantum computer. We start with the simplest case, which is to find the eigenvectors of a single-qubit observable. Since there are only two eigenvectors, we only need to obtain one of them, because the orthogonality property can determine the second one. Figure 1 shows the circuit diagram for this case. As we can see in Fig. 1 the agent in each iteration is given by

\begin{matrix} | A_{k}^{(0)} ⟩ = {\hat{D}}_{k} | 0 ⟩ . \end{matrix}

In this case, we have only one the rotation ( ${\hat{u}}_{1, 0}$ ) of the form of Eq. (7), then, for simplicity, we redefine the operator ${\hat{D}}_{k} = \hat{D} (θ_{k}, ϕ_{k}, λ_{k})$ as

\begin{matrix} \hat{D} (θ_{k}, ϕ_{k}, λ_{k}) = e^{- i \frac{λ_{k}}{2} {\hat{σ}}^{(z)}} e^{- i \frac{θ_{k}}{2} {\hat{σ}}^{(y)}} e^{- i \frac{ϕ_{k}}{2} {\hat{σ}}^{(z)}}, \end{matrix}

where ${\hat{σ}}^{(a)}$ is the a-Pauli matrix and

\begin{matrix} θ_{k + 1} = θ_{k} + Δ_{θ} \cdot δ_{1, m}, \\ ϕ_{k + 1} = ϕ_{k} + Δ_{ϕ} \cdot δ_{1, m}, \\ λ_{k + 1} = λ_{k} + Δ_{λ} \cdot δ_{1, m}, \end{matrix}

with ${Δ_{θ}, Δ_{ϕ}, Δ_{λ}} \in w_{k} [- π, π]$ and $w_{k}$ given by Eq. (13), considering only two outcomes ( $m \in {0, 1}$ ) and $j = 0$ for the whole algorithm. The gate in Eq. (15) has the form of the general qubit-rotation provided by qiskit, therefore, it can be efficiently implemented in the IBM quantum computer. We denote by, $F$ , the maximum fidelity between the agent state, $| A_{N}^{(0)} ⟩$ , and one of the eigenvectors at the end of the algorithm. We find that $F$ is related to the probability of obtaining the outcome $m = 0$ ( $P_{0}$ ) by (see appendix A)

\begin{matrix} P_{0} = \frac{1 - cos (Δ)}{2} [{(2 F - 1)}^{2} - 1] + 1 \\ \Rightarrow & F = \frac{1}{2} (1 + \sqrt{\frac{2 (P_{0} - 1)}{1 - cos Δ} + 1}), \end{matrix}

where $Δ = τ | α^{(0)} - α^{(1)} |$ is the gap between the eigenvalues of $τ \hat{O}$ [see Eqs. (2) and (3)]. Figure 2 shows $P_{0}$ as a function of the fidelity $F$ for different values of $Δ$ .

Diagram of the single-qubit protocol. The subindex k refers to the kth iteration. Blue lines represent the classical communication to the central processing unit. The gray arrows show feedback loops, where ${\hat{D}}_{k}$ and ${\hat{D}}_{k}^{†}$ are updated according to the measurement outcome.

$P_{0}$ as a function of $F$ for different values of $Δ$ .

For the implementation we use the initial values $θ_{1} = ϕ_{1} = λ_{1} = 0$ , $w_{1} = 1$ and the quantum processor “ibmqx2”. The algorithm is run until $w_{N} < 0.1$ . Since the algorithm converges stochastically to the eigenvectors, we perform 40 experiments in order to characterize the performance of the algorithm by the central values of the data set. Also, we compare the performances of our algorithms with the VQE algorithm for the same environments using the same quantum processor. To test the algorithm, we use three different environment Hermitian operators:

$\begin{matrix} τ \hat{O} = \frac{π}{2} σ_{x} \Rightarrow Δ = π \Rightarrow F = \frac{1}{2} (1 + \sqrt{P_{0}}) . \end{matrix}$

Here, we choose the reward ratio $r = 0.9$ and the punishment ratio $p = 1 / r$ . The results of the 40 experiments are collected in the Apendix Table 1 (Supplemental material) and summarized in the histograms of Fig. 3. From Fig. 3a, we can see that the probability $P_{0}$ is bigger than 0.85 in 36 cases, which implies, as is shown in Fig. 3b, that most cases give fidelities larger than 0.94. Also, we have 36 experiments with $F > 0.96$ , the average fidelity is $\bar{F} = 0.98$ and the standard deviation is $σ = 0.019$ which represent the $2 %$ of the average fidelity $\bar{F}$ . Also, the average number of iterations of the algorithm in the 40 experiments is $\bar{N} = 103$ , the minimum number of iterations $N_{\min} = 25$ , and the maximum number of iterations $N_{\max} = 528$ . This number may look large, but we remark that we using only one single-shot measurement per iteration. In comparison, if we want to calculate a given expectation value, we require at least 1000 single-shot measurements for a single qubit. Then for this case, our algorithm requires less resources than any other classical-quantum algorithm that utilizes expectation values. For the VQE algorithm, first we choose 500 single-shot measurements per step and COBYLA as the classical optimization method. VQE needs 33 COBYLA iterations to converge, which means 16500 single-shot measurements in total, i.e.100 times the resources needed in our algorithm, and get a fidelity of 0.997. If we change the number of single-shot measurements to 8192 per step (it is the maximum shots allowed by IBM), we need 35 COBYLA iterations to converge, which means 286720 single-shot measurements, 1000 times more resources than our algorithms, nevertheless, the fidelity is 0.999.

2.
$\begin{matrix} τ \hat{O} = \frac{π}{4} σ_{x} \Rightarrow Δ = \frac{π}{2} \Rightarrow F = \frac{1}{2} (1 + \sqrt{2 P_{0} - 1}) . \end{matrix}$

Histograms for the results of 40 independent experiments. with $τ \hat{O} = \frac{π}{2} σ_{x}$ , $r = 0.9$ and $p = 1 / r$ . (a) Histogram for the probability to obtain $m = 0$ . (b) Histogram for the fidelity between the agent and the nearest eigenvector using Eq. (17).

Now, we choose the reward ratio $r = 0.9$ and the punishment ratio $p = 1.5 / r$ . The results of the 40 experiments are collected in the Appendix Table 2 (see supplemental material) and summarized in the histograms of Fig. 4. From Fig. 4a we can see that the probability $P_{0}$ is bigger than 0.9 in 35 cases, which implies, as is shown in Fig. 4b, that most cases give fidelities larger than 0.94. Also, we have 30 experiments with $F > 0.96$ , the average fidelity is $\bar{F} = 0.97$ and the standard deviation is $σ = 0.022$ which represent the $2.3 %$ of the average fidelity $\bar{F}$ . Also, the average number of iterations of the algorithm in the 40 experiments is $\bar{N} = 116$ , the minimum number of iterations $N_{\min} = 25$ and the maximum number of iterations $N_{\max} = 572$ , again for this case our algorithm uses less resources than the algorithm that use expectation values. As in the previous case, we compare the results with the VQE algorithm. For 500 shots per step, we get a fidelity of 0.883 with 23 COBYLA iterations, which means 11500 single-shot measurements, i.e.100 times more resources than our algorithm. For 8192 shots per step, the fidelity is 0.891 and we need 23 COBYLA iterations, the total single-shot measurements are 188416, i.e.1000 times more resources than in our algorithm.

3.
$\begin{matrix} τ \hat{O} = cos \frac{1}{10} σ_{x} + sin \frac{1}{10} σ_{y} \Rightarrow Δ = 2 \\ \Rightarrow F = \frac{1}{2} (1 + \sqrt{1 + \frac{2 (P_{0} - 1)}{1 - cos 2}}) \end{matrix}$

We choose the reward ratio $r = 0.9$ and the punishment ratio $p = 1.5 / r$ as in the previous case. The results of the 40 experiments are collected in the Appendix Table 3 (see supplemental material) and summarized in the histograms of Fig. 5. From Fig. 5a we can see that the probability $P_{0}$ is bigger than 0.85 in 39 cases, which implies, as is shown in Fig. 5b, that most cases give fidelities larger than 0.94. Also, we have 30 experiments with $F > 0.98$ , the average fidelity is $\bar{F} = 0.98$ and the standard deviation of $σ = 0.015$ which represent the $1.6 %$ of the average fidelity $\bar{F}$ . Also, the average number of iterations of the algorithm in the 40 experiments was $\bar{N} = 227$ , the minimum number of iterations $N_{\min} = 26$ and the maximum number of iterations $N_{\max} = 782$ . In this case, as $N_{\max}$ is around 800, we compare the VQE algorithm, at first with 800 shots per step, obtaining a fidelity of 0.911 using 14 COBYLA iterations, which means, a total number of single-shot measurements of 11200, i.e.50 times more resources than our algorithms. When we use 8192 per step, the fidelity is 0.999 and we need 14 COBYLA iterations, obtaining a total number of single-shot measurements of 114688, i.e.500 times more resources than our algorithm.

Histograms for the results of 40 independent experiments. with $τ \hat{O} = \frac{π}{4} σ_{x}$ , $r = 0.9$ and $p = 1.5 / r$ . (a) Histogram for the probability to obtain $m = 0$ . (b) Histogram for the fidelity between the agent and the nearest eigenvector using Eq. (17).

Histograms for the results of 40 independent experiments. with $τ \hat{O} = cos \frac{1}{10} σ_{x} + sin \frac{1}{10} σ_{y}$ , $r = 0.9$ and $p = 1.5 / r$ . (a) Histogram for the probability to obtain $m = 0$ . (b) Histogram for the fidelity between the agent and the nearest eigenvector using Eq. (17).

Even if VQE allows us to reach fidelities larger than 0.98 (the mean fidelity of our algorithm), it needs several resources, more than 100 times the resources using by our algorithm, which implies a great advantage of our proposal.

Two-qubit case

In this case, we have three different agent states given by

\begin{matrix} | A_{k}^{(0)} ⟩ = {\hat{D}}_{k} | 00 ⟩, \\ | A_{k}^{(1)} ⟩ = {\hat{D}}_{k} | 01 ⟩, \\ | A_{k}^{(2)} ⟩ = {\hat{D}}_{k} | 10 ⟩ . \end{matrix}

We update the matrix ${\hat{D}}_{k}$ according to Eq. (12). To decompose the matrix ${\hat{D}}_{k}$ in a set of one- and two-qubit gates, we use the method already implemented in qiskit⁴⁵. To find all the eigenvectors we divide the protocol in three stages. In the first stage, we consider the agent state $| A_{k}^{(0)} ⟩ = {\hat{D}}_{k} | 00 ⟩$ , with ${\hat{D}}_{1} = I$ and $w_{1} = 1$ . The outcome of the measure have four possibilities $m \in {00, 01, 10, 11}$ and we run the algorithm until $w_{n_{1}} < 0.1$ ( $n_{1}$ iterations). After this, we have that $| A_{n_{1}}^{(0)} ⟩ = {\hat{D}}_{n_{1}} | 00 ⟩$ is the approximation of one of the eigenvectors of $\hat{O}$ .

In the second stage, we consider the agent state $| A_{k}^{(1)} ⟩ = {\hat{D}}_{k} | 01 ⟩$ , with ${\hat{D}}_{n_{1} + 1} = {\hat{D}}_{n_{1}}$ and $w_{n_{1} + 1} = 1$ . Now, we take into account only three outcome $m \in {01, 10, 11}$ , since we suppose that $| A_{N_{1}}^{(0)} ⟩$ is a good enough approximation. If we obtain $m = 00$ , we consider it as an error, and we define ${\hat{D}}_{k + 1} = {\hat{D}}_{k}$ and $w_{k + 1} = w_{k}$ , it means that we do nothing, and not apply the updating rule for ${\hat{D}}_{k + 1}$ and $w_{k + 1}$ , we denote this error as $c_{00}$ . We run this stage $n_{2}$ iterations until $w_{n_{1} + n_{2}} < 0.1$ . As we do not do rotations in the subspace spanned by ${| 00 ⟩, | 01 ⟩}$ during this stage, we have $| A_{n_{1} + n_{2}}^{(0)} ⟩ = | A_{n_{1}}^{(0)} ⟩$ . Now, we obtain the approximation of two eigenvectors $| A_{n_{1} + n_{2}}^{(1)} ⟩ = {\hat{D}}_{n_{1} + n_{2}} | 01 ⟩$ and $| A_{n_{1} + n_{2}}^{(0)} ⟩ = {\hat{D}}_{n_{1} + n_{2}} | 00 ⟩$ .

Finally, in the third stage, we consider the agent state $| A_{k}^{(2)} ⟩ = {\hat{D}}_{k} | 10 ⟩$ , with ${\hat{D}}_{n_{1} + n_{2} + 1} = {\hat{D}}_{n_{1} + n_{2}}$ and $w_{n_{1} + n_{2} + 1} = 1$ . Now, we have only two possibilities for the outcome measurement $m \in {10, 11}$ . Here, we also suppose that ${\hat{D}}_{n_{1} + n_{2}} | 00 ⟩$ and ${\hat{D}}_{n_{1} + n_{2}} | 01 ⟩$ are good enough approximations. If we obtain $m = 00$ or $m = 01$ , we consider them again as an error and we do not apply the update rule, denoting these errors as $c_{00}^{^{'}}$ and $c_{01}$ , like in the previous stage. We run this case $n_{3}$ iterations until $w_{n_{1} + n_{2} + n_{3}} < 0.1$ . In this stage, we only modify the subspace expanded by ${| 10 ⟩, | 11 ⟩}$ , then, we have that $| A_{n_{1} + n_{2} + n_{3}}^{(0)} ⟩ = | A_{n_{1} + n_{2}}^{(0)} ⟩ = | A_{n_{1}}^{(0)} ⟩$ and $| A_{n_{1} + n_{2} + n_{3}}^{(1)} ⟩ = | A_{n_{1} + n_{2}}^{(1)} ⟩$ . After this procedure we obtained the approximation of all the eigenvectors ${| A_{n_{T}}^{(0)} ⟩ = {\hat{D}}_{n_{T}} | 00 ⟩, | A_{n_{T}}^{(1)} ⟩ = {\hat{D}}_{n_{T}} | 01 ⟩, | A_{n_{T}}^{(2)} ⟩ = {\hat{D}}_{n_{T}} | 10 ⟩, | A_{n_{T}}^{(3)} ⟩ = {\hat{D}}_{n_{T}} | 11 ⟩}$ , with $n_{T} = n_{1} + n_{2} + n_{3}$ .

To test the algorithm, we choose three cases. First we consider the bi-local operator given by

1 . \begin{matrix} τ \hat{O} = σ_{x} σ_{x} = (\begin{matrix} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{matrix}) . \end{matrix}

In this case, the eigenstates and the eigenvalues are

\begin{matrix} | E^{(0)} ⟩ = & \frac{1}{\sqrt{2}} (| 00 ⟩ - | 11 ⟩), α^{(0)} = - 1, \\ | E^{(1)} ⟩ = & \frac{1}{\sqrt{2}} (- | 01 ⟩ + | 10 ⟩), α^{(1)} = - 1, \\ | E^{(2)} ⟩ = & \frac{1}{\sqrt{2}} (| 00 ⟩ + | 11 ⟩), α^{(2)} = 1, \\ | E^{(3)} ⟩ = & \frac{1}{\sqrt{2}} (| 01 ⟩ + | 10 ⟩), α^{(3)} = 1 . \end{matrix}

We note that the ground state is degenerate, then any linear state of the form $| ϕ ⟩ = a | E^{(0)} ⟩ + b | E^{(1)} ⟩$ will be also ground state of the operator and the same for the other states. In this case we define the fidelity of our algorithm by the probability to measure the initial state $| j ⟩$

\begin{matrix} F_{j} = P_{j} = | ⟨ j | {\hat{D}}_{n_{T}}^{†} \hat{E} {\hat{D}}_{n_{T}} {| j ⟩ |}^{2} . \end{matrix}

We run this case using IBM backend “ibmq_vigo” and the results are shown in Appendix Table 4 (see supplemental material). In this case, we run the algorithm ten times and the mean fidelities are: $F_{00} = 0.931$ , $F_{01} = 0.933$ , $F_{10} = 0.932$ , and $F_{11} = 0.919$ . The mean number of iterations is $\bar{N} = 272$ . In this case, the mean errors are: ${\bar{c}}_{00} = 10$ , ${\bar{c}}_{00}^{^{'}} = 8$ and ${\bar{c}}_{01} = 5$ . Therefore, the fidelity of our algorithm was higher than 0.91 for each eigenstate in less than 300 single-shot measurements. The same as the single-qubit case, we will compare with the VQE algorithm. At first, we choose 300 shots per step, and 56 COBYLA iterations, which means 16800 single-shot measurements, obtaining a fidelity of 0.976 for the ground state. Using 8192 shots per step, VQE needs 54 COBYLA iterations to converge, which means 442368 single-shot measurements, obtaining a fidelity of 0.997 for the ground state. In this case, VQE get a significantly more accurate result, but it is only for the ground state and uses 1000 times more resources than our algorithm which obtain all the eigenvectors.

The second example is the molecular hydrogen Hamiltonian with a bound length of $0.2 [\overset{˚}{A}]$ ⁴⁶:

\begin{matrix} H = g_{0} I + g_{1} Z_{0} + g_{2} Z_{1} + g_{3} Z_{0} Z_{1} + g_{4} Y_{0} Y_{1} + g_{5} X_{0} X_{1}, \end{matrix}

with $g_{0} = 2.8489, g_{1} = 0.5678, g_{2} = - 1.4508, g_{3} = 0.6799, g_{4} = 0.0791, g_{5} = 0.0791$ . In this case the environment is given by

2 . \begin{matrix} τ \hat{O} = (\begin{matrix} g_{0} + g_{1} + g_{2} + g_{3} & 0 & 0 & g_{5} - g_{4} \\ 0 & g_{0} + g_{1} - g_{2} - g_{3} & g_{4} + g_{5} & 0 \\ 0 & g_{4} + g_{5} & g_{0} - g_{1} + g_{2} - g_{3} & 0 \\ g_{5} - g_{4} & 0 & 0 & g_{0} - g_{1} - g_{2} + g_{3} \end{matrix}), \end{matrix}

with the next eigenvectors and eigenvalues

\begin{matrix} | E^{(0)} ⟩ = & - 0.03909568 | 01 ⟩ + 0.99923547 | 10 ⟩, α^{(0)} = 0.14421033, \\ | E^{(1)} ⟩ = & | 00 ⟩, α^{(1)} = 2.6458, \\ | E^{(2)} ⟩ = & 0.99923547 | 01 ⟩ + 0.03909568 | 10 ⟩, α^{(2)} = 4.19378967, \\ | E^{(3)} ⟩ = & | 11 ⟩, α^{(3)} = 4.4118 \end{matrix}

In this case, we choose the same method as the previous case to calculate the $F$ , we choose IBM backend “ibmq_valencia” and the results are shown in Appendix Table 5 (see supplemental material). In this case, we run the algorithm ten times and the mean fidelities are: $F_{00} = 0.989$ , $F_{01} = 0.973$ , $F_{10} = 0.976$ and $F_{11} = 0.979$ . The mean errors are: ${\bar{c}}_{00} = 7$ , ${\bar{c}}_{00}^{^{'}} = 4$ and ${\bar{c}}_{01} = 3$ and the mean number of iterations is $\bar{N} = 111$ . In this case, we need less than 150 single-shot measurements to obtain the fidelity over 0.97. For the VQE algorithm, at first we choose 120 shots per step and we need to use 59 COBYLA iterations, which means 7080 single-shot measurements, obtaining a fidelity of 0.994 for the ground state. When we use 8192 shots per step and VQE needs 64 COBYLA iterations to converge, it means 507904 single-shot measurements, obtaining a fidelity of 0.999 for the ground state. In this case, VQE can get better fidelities (larger than 0.99) but use again much more resources than our proposal, around 1000 times more to get only one of the eigenvectors.

The third case that we consider to test the algorithm is the non-degenerate two-qubit operator

3.
with eigenvectors and eigenvalues given by
$\begin{matrix} τ \hat{O} = (\begin{matrix} π & - \frac{π}{2} & - \frac{π}{4} & - \frac{π}{4} \\ - \frac{π}{2} & π & - \frac{π}{4} & - \frac{π}{4} \\ - \frac{π}{4} & - \frac{π}{4} & \frac{π}{2} & 0 \\ - \frac{π}{4} & - \frac{π}{4} & 0 & \frac{π}{2} \end{matrix}), \end{matrix}$ 25

\begin{matrix} | E^{(0)} ⟩ = & \frac{1}{2} (| 00 ⟩ + | 01 ⟩ + | 10 ⟩ + | 11 ⟩), α^{(0)} = 0, \\ | E^{(1)} ⟩ = & \frac{1}{\sqrt{2}} (| 10 ⟩ - | 11 ⟩), α^{(1)} = \frac{π}{2}, \\ | E^{(2)} ⟩ = & \frac{1}{2} (| 00 ⟩ + | 01 ⟩ - | 10 ⟩ - | 11 ⟩), α^{(2)} = π, \\ | E^{(3)} ⟩ = & \frac{1}{\sqrt{2}} (| 00 ⟩ - | 01 ⟩), α^{(3)} = \frac{3 π}{2} . \end{matrix}

We run the algorithm in the IBM quantum computer “ibmq_vigo”. In order to reduce the total number of iterations, we run the three stages of the algorithm four times as follows:

We choose $r = 0.6, p = 1 / r, {\hat{D}}_{1} = I, w_{1} = 1$ . Suppose that the total number of iteration after the three stages is $N_{1} = η_{1}$ .
We choose $r = 0.7, p = 1 / r, {\hat{D}}_{η_{1} + 1} = {\hat{D}}_{η_{1}}, w_{η_{1} + 1} = 1$ . Suppose that the total number of iteration after the three stages is $N_{2} = η_{1} + η_{2}$ .
We choose $r = 0.8, p = 1 / r, {\hat{D}}_{N_{2} + 1} = {\hat{D}}_{N_{2}}, w_{N_{2} + 1} = 1$ . Suppose that the total number of iteration after the three stages is $N_{3} = η_{1} + η_{2} + η_{3}$ .
We choose $r = 0.9, p = 1 / r, {\hat{D}}_{N_{3} + 1} = {\hat{D}}_{N_{3}}, w_{N_{3} + 1} = 1$ , and suppose that the total number of iteration after the three stages is $N = η_{1} + η_{2} + η_{3} + η_{4}$ .

We define the fidelity of each approximation as

\begin{matrix} F_{ℓ m} = max_{k = {0, 1, 2, 3}} | ⟨ E^{(k)} | {\hat{D}}_{N} {| ℓ m ⟩ |}^{2} . \end{matrix}

To obtain a data set to evaluate the performance of our protocol, we perform ten independent experiments. These data are collected in Appendix Table 6 (see supplemental material). The average fidelities that we obtain are ${\bar{F}}_{00} = 0.941, {\bar{F}}_{01} = 0.933, {\bar{F}}_{10} = 0.929, {\bar{F}}_{11} = 0.935$ , the average number of iterations is $\bar{N} = 1396$ and the mean errors are: ${\bar{c}}_{00} = 29$ , ${\bar{c}}_{00}^{^{'}} = 19$ and ${\bar{c}}_{01} = 18$ . Therefore, in this case we obtain the four eigenvectors with fidelities larger than 0.92 in less than 1500 single-shot measurements, which at least corresponds to 6 measurements of mean values, being not enough for a classical-quantum algorithm that uses the optimization of mean values. For the VQE algorithm, we choose 2000 shots per step using 77 COBYLA iterations, which means 157000 single-shot measurements obtaining a fidelity of 0.918 for the ground state. For 8192 shots per step, VQE needs 88 COBYLA iterations to converge, it means 720896 single-shot measurements obtaining a fidelity of 0.944. In this case, VQE cannot surpass the performance of our algorithm, and use more than 100 times resources than our proposal only for the ground state.

For $n -$ qubit observable ( $n > 2$ ), we can use the same protocol but considering more measurement outputs, which implies more stages in the algorithm.

Conclusions

In this work, we implement satisfactorily the approximate eigensolver⁴⁴ using the IBM quantum computer. For the single-qubit case, we obtain fidelities larger than 0.97 for both eigenvectors using around 200 single-shot measurements. For the two-qubit case, we use around 1500 single-shot measurements to obtain the approximation of the four eigenvectors with fidelity over 0.9. Due to the stochastic nature of this protocol, we cannot ensure that the approximation converges asymptotically with the number of iteration to the eigenvectors. Nevertheless, it is useful to obtain a fast approximation to use as a guess into another eigensolver that can reach maximal fidelity, like in the eigensolver of Ref.⁴³. Also, we compare the performance of our proposal with the VQE algorithm, where VQE, in general, get better fidelities in the single-qubit case but use more than 100 times the number of resources than our algorithm. For two-qubit, the advantage in the maximal fidelity of VQE is a little better in comparison with our algorithm, but again, VQE needs several resources, i.e.more than 1000 times the resources used by our algorithm for all the eigenvectors. Also, the performance of the VQE algorithm depends on the variational ansatz used, which is not the case with our algorithm. This dependence of the VQE algorithms allows enhancing its performance using a better ansatz. The main goal of our algorithm is to get a high fidelity approximation for all the eigenvectors with few resources. This goal is completely satisfied in comparison with the resources needed for VQE. On the other hand, by manipulating the convergence criteria of our algorithm, we can reach better fidelities. Finally, this work also paves the way for the development of future suitable quantum devices to work with limited resources.

Supplementary Information

Supplementary Information.^{(91.7KB, pdf)}

Acknowledgements

We acknowledge financial support from Spanish MCIU/AEI/FEDER (PGC2018-095113-B-I00), Basque Government IT986-16, projects QMiCS (820505) and OpenSuperQ (820363) of EU Flagship on Quantum Technologies, EU FET Open Grant Quromorphic, EPIQUS, and Shanghai STCSM (Grant No. 2019SHZDZX01-ZX04).

Author contributions

E.S. and F.A.-A. supervised and contributed to the theoretical analysis. N.B. carried out all calculations and prepared the figures. C.-Y.P. and M.H write the qiskit program to run in IBM quantum experience. All the authors wrote the manuscript. All authors contributed to the results discussion and revised the manuscript.

Data availability

The qiskit codes of the one-qubit case and the two-qubit case are available in https://github.com/Panchiyue/Qiskit-Code/tree/main.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

E. Solano, Email: enr.solano@gmail.com

F. Albarrán-Arriagada, Email: pancho.albarran@gmail.com

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-021-90534-7.

References

1.Russell S, Norvig P. Artificial Intelligence: A Modern Approach. New Jersey: Prentice Hall; 1995. [Google Scholar]
2.Metha P, et al. A high-bias, low-variance introduction to machine learning for physicists. Phys. Rep. 2019;810:1–124. doi: 10.1016/j.physrep.2019.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Ghahramani Z. Advanced Lectures on Machine Learning. Berlin: Springer; 2004. [Google Scholar]
4.Kotsiantis SB. Supervised machine learning: A review of classification techniques. Informatica. 2007;31:249–268. [Google Scholar]
5.Wiebe N, Braun D, Lloyd S. Quantum algorithm for data fitting. Phys. Rev. Lett. 2012;109:050505. doi: 10.1103/PhysRevLett.109.050505. [DOI] [PubMed] [Google Scholar]
6.Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum algorithms for supervised and unsupervised machine learning. arXiv:1307.0411 [quant-ph] (2013).
7.Rebentrost P, Mohseni M, Lloyd S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 2014;113:130503. doi: 10.1103/PhysRevLett.113.130503. [DOI] [PubMed] [Google Scholar]
8.Li Z, Liu X, Xu N, Du J. Experimental realization of a quantum support vector machine. Phys. Rev. Lett. 2015;114:140504. doi: 10.1103/PhysRevLett.114.140504. [DOI] [PubMed] [Google Scholar]
9.Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge: MIT Press; 2018. [Google Scholar]
10.Jaderberg M, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science. 2019;364:859–865. doi: 10.1126/science.aau6249. [DOI] [PubMed] [Google Scholar]
11.Lamata L. Basic protocols in quantum reinforcement learning with superconducting circuits. Sci. Rep. 2017;7:1609. doi: 10.1038/s41598-017-01711-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996;4:237–285. doi: 10.1613/jair.301. [DOI] [Google Scholar]
13.Dong D, Chen C, Li H, Tarn T-J. Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. B Cybern. 2008;38:1207–1220. doi: 10.1109/TSMCB.2008.925743. [DOI] [PubMed] [Google Scholar]
14.Mnih V, et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]
15.Riedmiller M, Gabel T, Hafner R, Lange S. Reinforcement learning for robot soccer. Auton. Robot. 2009;27:55–73. doi: 10.1007/s10514-009-9120-4. [DOI] [Google Scholar]
16.Yu S, et al. Reconstruction of a photonic qubit state with reinforcement learning. Adv. Quantum Technol. 2019;2:1800074. doi: 10.1002/qute.201800074. [DOI] [Google Scholar]
17.Albarrán-Arriagada F, Retamal JC, Solano E, Lamata L. Measurement-based adaptation protocol with quantum reinforcement learning. Phys. Rev. A. 2018;98:042315. doi: 10.1103/PhysRevA.98.042315. [DOI] [Google Scholar]
18.Littman ML. Reinforcement learning improves behaviour from evaluative feedback. Nature. 2015;521:445–451. doi: 10.1038/nature14540. [DOI] [PubMed] [Google Scholar]
19.Silver D, et al. Mastering the game of Go without human knowledge. Nature. 2017;550:354–359. doi: 10.1038/nature24270. [DOI] [PubMed] [Google Scholar]
20.Silver D, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science. 2018;362:1140–1144. doi: 10.1126/science.aar6404. [DOI] [PubMed] [Google Scholar]
21.Vinyals O, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature. 2019;575:350–354. doi: 10.1038/s41586-019-1724-z. [DOI] [PubMed] [Google Scholar]
22.Nielsen MA, Chuang IL. Quantum Computation and Quantum Information: 10th Anniversary Edition. New York: Cambridge University Press; 2010. [Google Scholar]
23.Grover, L. K. A fast quantum mechanical algorithm for database search, in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing 212–219 (1996).
24.Shor PW. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J. Comput. 1997;26:1484–1509. doi: 10.1137/S0097539795293172. [DOI] [Google Scholar]
25.Harrow AW, Hassidim A, Lloyd S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 2009;103:150502. doi: 10.1103/PhysRevLett.103.150502. [DOI] [PubMed] [Google Scholar]
26.Cai X-D, et al. Experimental quantum computing to solve systems of linear equations. Phys. Rev. Lett. 2013;110:230501. doi: 10.1103/PhysRevLett.110.230501. [DOI] [PubMed] [Google Scholar]
27.Xin T, et al. Quantum algorithm for solving linear differential equations: Theory and experiment. Phys. Rev. A. 2020;101:032307. doi: 10.1103/PhysRevA.101.032307. [DOI] [Google Scholar]
28.Biamonte J, et al. Quantum machine learning. Nature. 2017;549:195–202. doi: 10.1038/nature23474. [DOI] [PubMed] [Google Scholar]
29.Schuld M, Sinayskiy I, Petruccione F. An introduction to quantum machine learning. Contemp. Phys. 2015;56:172–185. doi: 10.1080/00107514.2014.964942. [DOI] [Google Scholar]
30.Dunjko V, Taylor JM, Briegel HJ. Quantum-enhanced machine learning. Phys. Rev. Lett. 2016;117:130501. doi: 10.1103/PhysRevLett.117.130501. [DOI] [PubMed] [Google Scholar]
31.Gao J, et al. Experimental machine learning of quantum states. Phys. Rev. Lett. 2018;120:240501. doi: 10.1103/PhysRevLett.120.240501. [DOI] [PubMed] [Google Scholar]
32.Schuld M, Killoran N. Quantum machine learning in feature Hilbert spaces. Phys. Rev. Lett. 2019;122:040504. doi: 10.1103/PhysRevLett.122.040504. [DOI] [PubMed] [Google Scholar]
33.Lau H-K, Pooser R, Siopsis G, Weedbrook C. Quantum machine learning over infinite dimensions. Phys. Rev. Lett. 2017;118:080501. doi: 10.1103/PhysRevLett.118.080501. [DOI] [PubMed] [Google Scholar]
34.Wittek P. Quantum Machine Learning: What Quantum Computing Means to Data Mining. New York: Academic Press; 2014. [Google Scholar]
35.Lamata L. Quantum machine learning and quantum biomimetics: A perspective. Mach. Learn. Sci. Technol. 2020;1:033002. doi: 10.1088/2632-2153/ab9803. [DOI] [Google Scholar]
36.Preskill J. Quantum computing in the NISQ era and beyond. Quantum. 2018;2:79. doi: 10.22331/q-2018-08-06-79. [DOI] [Google Scholar]
37.Aleksandrowicz, G. et al. Qiskit: An open-source framework for quantum computing (2019).
38.IBM-Q Experience (2019).
39.McClean JR, Romero J, Babbush R, Aspuru-Guzik A. The theory of variational hybrid quantum-classical algorithms. New J. Phys. 2016;18:023023. doi: 10.1088/1367-2630/18/2/023023. [DOI] [Google Scholar]
40.Kandala A, et al. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature. 2017;549:242–246. doi: 10.1038/nature23879. [DOI] [PubMed] [Google Scholar]
41.Peruzzo A, et al. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun. 2014;5:4213. doi: 10.1038/ncomms5213. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Lavrijsen, W. et al. Classical optimizers for Noisy Intermediate-Scale Quantum devices, in IEEE International Conference on Quantum Computing & Engineering (QCE20) (2020).
43.Wei S, Li H, Long G-L. A full quantum eigensolver for quantum chemistry simulations. Research. 2020;2020:1486935. doi: 10.34133/2020/1486935. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Albarrán-Arriagada F, Retamal JC, Solano E, Lamata L. Reinforcement learning for semi-autonomous approximate quantum eigensolver. Mach. Learn. Sci. Technol. 2020;1:015002. doi: 10.1088/2632-2153/ab43b4. [DOI] [Google Scholar]
45.Qiskit command operator.
46.O’Malley PJJ, et al. Scalable quantum simulation of molecular energies. Phys. Rev. X. 2016;6:301007. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information.^{(91.7KB, pdf)}

Data Availability Statement

The qiskit codes of the one-qubit case and the two-qubit case are available in https://github.com/Panchiyue/Qiskit-Code/tree/main.

[CR1] 1.Russell S, Norvig P. Artificial Intelligence: A Modern Approach. New Jersey: Prentice Hall; 1995. [Google Scholar]

[CR2] 2.Metha P, et al. A high-bias, low-variance introduction to machine learning for physicists. Phys. Rep. 2019;810:1–124. doi: 10.1016/j.physrep.2019.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Ghahramani Z. Advanced Lectures on Machine Learning. Berlin: Springer; 2004. [Google Scholar]

[CR4] 4.Kotsiantis SB. Supervised machine learning: A review of classification techniques. Informatica. 2007;31:249–268. [Google Scholar]

[CR5] 5.Wiebe N, Braun D, Lloyd S. Quantum algorithm for data fitting. Phys. Rev. Lett. 2012;109:050505. doi: 10.1103/PhysRevLett.109.050505. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Lloyd, S., Mohseni, M. & Rebentrost, P. Quantum algorithms for supervised and unsupervised machine learning. arXiv:1307.0411 [quant-ph] (2013).

[CR7] 7.Rebentrost P, Mohseni M, Lloyd S. Quantum support vector machine for big data classification. Phys. Rev. Lett. 2014;113:130503. doi: 10.1103/PhysRevLett.113.130503. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Li Z, Liu X, Xu N, Du J. Experimental realization of a quantum support vector machine. Phys. Rev. Lett. 2015;114:140504. doi: 10.1103/PhysRevLett.114.140504. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Sutton RS, Barto AG. Reinforcement Learning: An Introduction. Cambridge: MIT Press; 2018. [Google Scholar]

[CR10] 10.Jaderberg M, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science. 2019;364:859–865. doi: 10.1126/science.aau6249. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Lamata L. Basic protocols in quantum reinforcement learning with superconducting circuits. Sci. Rep. 2017;7:1609. doi: 10.1038/s41598-017-01711-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996;4:237–285. doi: 10.1613/jair.301. [DOI] [Google Scholar]

[CR13] 13.Dong D, Chen C, Li H, Tarn T-J. Quantum reinforcement learning. IEEE Trans. Syst. Man Cybern. B Cybern. 2008;38:1207–1220. doi: 10.1109/TSMCB.2008.925743. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Mnih V, et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Riedmiller M, Gabel T, Hafner R, Lange S. Reinforcement learning for robot soccer. Auton. Robot. 2009;27:55–73. doi: 10.1007/s10514-009-9120-4. [DOI] [Google Scholar]

[CR16] 16.Yu S, et al. Reconstruction of a photonic qubit state with reinforcement learning. Adv. Quantum Technol. 2019;2:1800074. doi: 10.1002/qute.201800074. [DOI] [Google Scholar]

[CR17] 17.Albarrán-Arriagada F, Retamal JC, Solano E, Lamata L. Measurement-based adaptation protocol with quantum reinforcement learning. Phys. Rev. A. 2018;98:042315. doi: 10.1103/PhysRevA.98.042315. [DOI] [Google Scholar]

[CR18] 18.Littman ML. Reinforcement learning improves behaviour from evaluative feedback. Nature. 2015;521:445–451. doi: 10.1038/nature14540. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Silver D, et al. Mastering the game of Go without human knowledge. Nature. 2017;550:354–359. doi: 10.1038/nature24270. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Silver D, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science. 2018;362:1140–1144. doi: 10.1126/science.aar6404. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Vinyals O, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature. 2019;575:350–354. doi: 10.1038/s41586-019-1724-z. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Nielsen MA, Chuang IL. Quantum Computation and Quantum Information: 10th Anniversary Edition. New York: Cambridge University Press; 2010. [Google Scholar]

[CR23] 23.Grover, L. K. A fast quantum mechanical algorithm for database search, in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing 212–219 (1996).

[CR24] 24.Shor PW. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J. Comput. 1997;26:1484–1509. doi: 10.1137/S0097539795293172. [DOI] [Google Scholar]

[CR25] 25.Harrow AW, Hassidim A, Lloyd S. Quantum algorithm for linear systems of equations. Phys. Rev. Lett. 2009;103:150502. doi: 10.1103/PhysRevLett.103.150502. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Cai X-D, et al. Experimental quantum computing to solve systems of linear equations. Phys. Rev. Lett. 2013;110:230501. doi: 10.1103/PhysRevLett.110.230501. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Xin T, et al. Quantum algorithm for solving linear differential equations: Theory and experiment. Phys. Rev. A. 2020;101:032307. doi: 10.1103/PhysRevA.101.032307. [DOI] [Google Scholar]

[CR28] 28.Biamonte J, et al. Quantum machine learning. Nature. 2017;549:195–202. doi: 10.1038/nature23474. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Schuld M, Sinayskiy I, Petruccione F. An introduction to quantum machine learning. Contemp. Phys. 2015;56:172–185. doi: 10.1080/00107514.2014.964942. [DOI] [Google Scholar]

[CR30] 30.Dunjko V, Taylor JM, Briegel HJ. Quantum-enhanced machine learning. Phys. Rev. Lett. 2016;117:130501. doi: 10.1103/PhysRevLett.117.130501. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Gao J, et al. Experimental machine learning of quantum states. Phys. Rev. Lett. 2018;120:240501. doi: 10.1103/PhysRevLett.120.240501. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Schuld M, Killoran N. Quantum machine learning in feature Hilbert spaces. Phys. Rev. Lett. 2019;122:040504. doi: 10.1103/PhysRevLett.122.040504. [DOI] [PubMed] [Google Scholar]

[CR33] 33.Lau H-K, Pooser R, Siopsis G, Weedbrook C. Quantum machine learning over infinite dimensions. Phys. Rev. Lett. 2017;118:080501. doi: 10.1103/PhysRevLett.118.080501. [DOI] [PubMed] [Google Scholar]

[CR34] 34.Wittek P. Quantum Machine Learning: What Quantum Computing Means to Data Mining. New York: Academic Press; 2014. [Google Scholar]

[CR35] 35.Lamata L. Quantum machine learning and quantum biomimetics: A perspective. Mach. Learn. Sci. Technol. 2020;1:033002. doi: 10.1088/2632-2153/ab9803. [DOI] [Google Scholar]

[CR36] 36.Preskill J. Quantum computing in the NISQ era and beyond. Quantum. 2018;2:79. doi: 10.22331/q-2018-08-06-79. [DOI] [Google Scholar]

[CR37] 37.Aleksandrowicz, G. et al. Qiskit: An open-source framework for quantum computing (2019).

[CR38] 38.IBM-Q Experience (2019).

[CR39] 39.McClean JR, Romero J, Babbush R, Aspuru-Guzik A. The theory of variational hybrid quantum-classical algorithms. New J. Phys. 2016;18:023023. doi: 10.1088/1367-2630/18/2/023023. [DOI] [Google Scholar]

[CR40] 40.Kandala A, et al. Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets. Nature. 2017;549:242–246. doi: 10.1038/nature23879. [DOI] [PubMed] [Google Scholar]

[CR41] 41.Peruzzo A, et al. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun. 2014;5:4213. doi: 10.1038/ncomms5213. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Lavrijsen, W. et al. Classical optimizers for Noisy Intermediate-Scale Quantum devices, in IEEE International Conference on Quantum Computing & Engineering (QCE20) (2020).

[CR43] 43.Wei S, Li H, Long G-L. A full quantum eigensolver for quantum chemistry simulations. Research. 2020;2020:1486935. doi: 10.34133/2020/1486935. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Albarrán-Arriagada F, Retamal JC, Solano E, Lamata L. Reinforcement learning for semi-autonomous approximate quantum eigensolver. Mach. Learn. Sci. Technol. 2020;1:015002. doi: 10.1088/2632-2153/ab43b4. [DOI] [Google Scholar]

[CR45] 45.Qiskit command operator.

[CR46] 46.O’Malley PJJ, et al. Scalable quantum simulation of molecular energies. Phys. Rev. X. 2016;6:301007. [Google Scholar]

PERMALINK

Experimental semi-autonomous eigensolver using reinforcement learning

C-Y Pan

M Hao

N Barraza

E Solano

F Albarrán-Arriagada

Abstract

Introduction