Fast reconstruction algorithm based on HMC sampling

Hang Lian; Jinchen Xu; Yu Zhu; Zhiqiang Fan; Yi Liu; Zheng Shan

doi:10.1038/s41598-023-45133-z

. 2023 Oct 18;13:17773. doi: 10.1038/s41598-023-45133-z

Fast reconstruction algorithm based on HMC sampling

Hang Lian ¹, Jinchen Xu ¹, Yu Zhu ¹, Zhiqiang Fan ¹, Yi Liu ¹, Zheng Shan ^1,^✉

PMCID: PMC10584981 PMID: 37853048

Abstract

In Noisy Intermediate-Scale Quantum (NISQ) era, the scarcity of qubit resources has prevented many quantum algorithms from being implemented on quantum devices. Circuit cutting technology has greatly alleviated this problem, which allows us to run larger quantum circuits on real quantum machines with currently limited qubit resources at the cost of additional classical overhead. However, the classical overhead of circuit cutting grows exponentially with the number of cuts and qubits, and the excessive postprocessing overhead makes it difficult to apply circuit cutting to large scale circuits. In this paper, we propose a fast reconstruction algorithm based on Hamiltonian Monte Carlo (HMC) sampling, which samples the high probability solutions by Hamiltonian dynamics from state space with dimension growing exponentially with qubit. Our algorithm avoids excessive computation when reconstructing the original circuit probability distribution, and greatly reduces the circuit cutting post-processing overhead. The improvement is crucial for expanding of circuit cutting to a larger scale on NISQ devices.

Subject terms: Mathematics and computing, Computer science, Information technology

Introduction

Quantum computing has demonstrated its superiority over classical computer in various scientific fields¹, including machine learning^2,3, chemistry^4,5, code breaking⁶, finance^7,8, and other fields⁹, where some quantum algorithms have exponential speedup in theory over classical algorithms. However, due to technical limitations, available quantum devices are currently called Noisy Intermediate-Scale Quantum devices. "Noisy" indicates that quantum devices are affected by noise, while "intermediate-scale" signifies that the number of qubits in these devices ranges from 50 to several hundred¹⁰. In the NISQ era, the challenge of constructing a reliable quantum device increases significantly as the number of qubits increases due to quantum noise. As a result, the current NISQ devices suffer from limited reliability and scalability¹¹. The limitation of qubit resources leads to many quantum algorithms that cannot be applied to practical problems and do not reflect the superiority of quantum computing.

To address this problem, scholars worldwide have been studying circuit knitting techniques, which can simulate larger quantum circuits on small-scale quantum computing^12–18. One of the crucial methods to realize circuit knitting is by implementing circuit cutting. Circuit cutting can divide large quantum circuits into multiple small quantum subcircuits, which can be executed independently. Moreover, the results of all subcircuits can be reconstructed into the theoretical results of the original circuit through additional classical post-processing algorithms. This approach enables us to run larger quantum circuits on a limited qubit resource quantum device, while incurring an increase in classical overhead costs.

Although circuit cutting is crucial to alleviate this problem of insufficient qubit resources in the NISQ era, circuit cutting currently has some limitations. The classical resources consumed by the post-processing of circuit cutting grow exponentially with the required number of cuts and the total number of qubits in the quantum circuit. Many algorithms have been developed to reduce post-processing overhead. For example, Lowe et al. addressed this issue by implementing random measurements¹⁹. Saleem et al. reduced the overhead by reducing the number of cuts¹⁷. Tang et al. proposed a dynamic-definition (DD) query algorithm, which can reduce the overhead by recursively and efficiently finding the circuit probability distribution from state space¹⁸. Chen et al. proposed an approximate reconstruction algorithm to sample the high probability solution space by MCMC sampling to reduce the overhead²⁰.

This paper introduces a fast reconstruction algorithm. Through the evaluation of various random circuits, our reconstruction algorithm demonstrates an average runtime that is 45.6 times faster than the traditional exact reconstruction algorithm [Eq. (4)], 2.32 times faster than the DD algorithm, and approximately 1.47 times faster than the approximate reconstruction algorithm. Unlike the traditional exact reconstruction that reconstructs the probability distribution of the original circuit by traversing all possible quantum states, our reconstruction algorithm samples the high probability solutions by Hamiltonian dynamics from state space. By HMC sampling, our reconstruction algorithm reduces the overhead associated with reconstructing the original circuit results after each cut. Specifically, the contributions of this paper are as follows:

We propose a fast reconstruction algorithm based on HMC. The algorithm allows faster selection of high probability circuit results from state space by Hamiltonian dynamics, greatly reducing the time overhead.
In our reconstruction algorithm, only high probability solutions are sampled without the need for reconstructing all quantum states. This eliminates any 0-probability solution states or states with extremely low probabilities. The algorithm returns solution states with high probabilities, leading to a substantial reduction in space overhead.
We propose a method to decrease the number of cuts by decomposing two-qubit gates. This approach is employed in the experimental circuit described in this paper, which successfully reduces the number of cuts to 1.

This paper is organized as follows. In “Background”, we provide a brief introduction to the relevant fundamentals. In “Methods”, we outline the main content of our reconstruction algorithm. In “Results”, we conduct single cut reconstruction experiments using our reconstruction algorithm along with other approaches. In this section, a virtual two-qubit gate technique is employed to preprocess the reconfigured experimental circuit, enabling successful single cut splitting. In “Conclusion”, we conclude the paper and offers perspectives for future work.

Background

Circuit cutting

Circuit cutting is also known as a time-like cut. In quantum computing, any unitary matrix can be theoretically decomposed into a set of combinations of Pauli matrices ${I, X, Y, Z}$ ¹⁸. Specifically, any matrix A can be decomposed into the following formulas:

A = \frac{T r (A I) I + T r (A X) X + T r (A Y) Y + T r (A Z) Z}{2}

We can further decompose the Pauli matrix into its eigenvalues and eigenvectors, and again derive the following Eq. (2)

\begin{matrix} A_{1} = [T r (A, I) + T r (A, Z)] |0) ⟩ ⟨ (0| \\ A_{2} = [T r (A, I) - T r (A, Z)] |1) ⟩ ⟨ (1| \\ A_{3} = T r (A X) [2 |+) ⟩ ⟨ (+| - |0) ⟩ ⟨ (0| - |1) ⟩ ⟨ (1|] \\ A_{4} = T r (A Y) [2 |+ i) ⟩ ⟨ (- i| - |0) ⟩ ⟨ (0| - | |1) ⟩ ⟨ (1|] \\ A = (A_{1} + A_{2} + A_{3} + A_{4}) / 2 \end{matrix}

Each trace operator corresponds to a set of measurements in a particular Pauli basis, while the density matrix comprising of eigenvectors corresponds to a set of initialization operations. Since the physical implementation of the measurement in the $I$ base and $Z$ base is identical, we can knit together the cut points by three measurement operations ( $X, Y, Z$ ) and four initialization operations ( $|0) ⟩, |1) ⟩, |+) ⟩, |i) ⟩$ ). This facilitates the generation of three distinct upstream subcircuits (referred to as Fragment 1) with different measurement bases, and four downstream subcircuits (referred to as Fragment 2) with different initializations, as illustrated in Fig. 1. These subcircuits can be run independently, and the results can be measured.

A quantum circuit can be divided into two parts using a single cut. These two parts create multiple subcircuits by incorporating different measurement bases and initialization operations. These subcircuits can run independently and the output of the original circuit can be reconstructed through classical post-processing.

However, it should be noted that since the last qubit of the upstream subcircuit does not appear in the final output of the uncut circuit, the results of the upstream subcircuits need additional processing. The results of the upstream subcircuits need to be multiplied by a factor $α = \pm 1$ . The sign of $α$ depends on both the measurement base and the measurement result of the last qubit. When the measurement base is $I$ , $α = 1$ regardless of the measurement result of the last qubit. However, when the measurement base is ${X, Y, Z}$ , $α = 1$ if the measurement result of the last qubit is $|0) ⟩$ , and $α = - 1$ if the measurement result is $|1) ⟩$ .This relationship is summarized in Eq. (3) below:

\begin{matrix} \bar{x 0}, \bar{x 1} \to + \bar{x}, M = I \\ \bar{x 0} \to + \bar{x}, \\ \bar{x 1} \to - \bar{x}, M = {X, Y, Z} \end{matrix}

Following the processing of the upstream subcircuit, it is necessary to compute the terms that correspond to both the upstream and downstream subcircuits to reconstruct the original circuit's probability distribution. Assuming that the original is cut as in Fig. 1, the cut position is at the $n / 2$ _nd qubit, and the circuit quantum state is $|x_{0} x_{1} x_{2} \dots x_{n}) ⟩$ . The quantum state associated with the upstream subcircuit is $x_{up} = |x_{\{\frac{n}{2} + 1\}} \dots x_{n}) ⟩$ , and the quantum state associated with the downstream subcircuit is $x_{down} = |x_{0} \dots x_{\frac{n}{2}}) ⟩$ ,which $x_{i} \in \{0, 1\}$ .

According to Eqs. (2) and (3), we know that during the reconstruction process, the upstream subcircuit consists of four terms,

p_{1, 1} = p (| 0 x_{up} ⟩ ∣ I) + p (| 1 x_{up} ⟩ ∣ I) + p (| 0 x_{up} ⟩ ∣ Z) - p (| 1 x_{up} ⟩ ∣ Z)

p_{1, 2} = p (| 0 x_{up} ⟩ ∣ I) + p (| 1 x_{up} ⟩ ∣ I) - p (| 0 x_{up} ⟩ ∣ Z) + p (| 1 x_{up} ⟩ ∣ Z)

p_{1, 3} = p (| 0 x_{up} ⟩ ∣ X) - p (| 1 x_{up} ⟩ ∣ X)

p_{1, 4} = p (| 0 x_{up} ⟩ ∣ Y) - p (| 1 x_{up} ⟩ ∣ Y)

The four terms of the downstream subcircuit are

p_{2, 1} = p (|, x_{down}) ⟩ ||0), ⟩)

p_{2, 2} = p (|, x_{down}) ⟩ ||1), ⟩)

p_{2, 3} = 2 p (|, x_{down}) ⟩ ||+), ⟩) - p (|, x_{down}) ⟩ ||0), ⟩) - p (|, x_{down}) ⟩ ||1), ⟩)

p_{2, 4} = 2 p (| x_{down} ⟩ | | i ⟩) - p (| x_{down} ⟩ | | 0 ⟩) - p (| x_{down} ⟩ | | 1 ⟩)

The probability of the original circuit quantum state, denoted as $|x_{0} x_{1} x_{2} \dots x_{n}) ⟩$ , can be calculated by taking the Kronecker product of the output terms from the upstream and downstream subcircuits. These output terms form 4 pairs, and their results are subsequently summed up, as shown in Eq. (4).

p (|x_{0} x_{1} x_{2} \dots x_{n}) ⟩) = p (|x_{down}, x_{up}) ⟩) = \frac{1}{2} \sum_{i = 1}^{4} p_{1, i} \otimes p_{2, i}

The probability distribution of the original circuit is obtained simply by calculating the probability of each quantum state in the original circuit using Equation (4).

Virtual two-qubit gate

A virtual two-qubit gate is a technique that can decompose a two-qubit gate into a series of single-qubit gates, also known as a space-like cut. It essentially involves classical post-processing and sampling of single-qubit gates to ‘simulate’ the entanglement effect produced by two-qubit gates. This means that a two-qubit gate can be expressed as a series of individual quantum gate operations, such as the Pauli matrices ${X, Y, Z}$ , as well as single-qubit rotating gates $Rx$ , $Ry$ , and $Rz$ around the $X$ , $Y$ , and $Z$ axes.

The super-operator $S (U)$ shown in Eq. (5) corresponds to the evolution of a complete quantum state, where $ρ$ is the density matrix of the quantum state and $U$ is the unitary operator acting on the quantum state,

S (U) = U ρ U^{†}

Suppose $A_{1}, A_{2}$ are both unitary operators, with

\begin{matrix} S (e^{i θ A_{1} \otimes A_{2}}) = {cos}^{2} θ S (I \otimes I) + {sin}^{2} θ S (A_{1} \otimes A_{2}) \\ + \frac{1}{8} cos θ sin θ \sum_{(α_{1}, α_{2}) \in {\pm 1}^{2}} α_{1} α_{2} [S ((I + α_{1} A_{1}) \otimes (I + i α_{2} A_{2}))) \\ (+ S ((I + i α_{1} A_{1}) \otimes (I + α_{2} A_{2}))] \end{matrix}

Therefore, it can be deduced that a two-qubit gate of form $e^{i θ A_{1} \otimes A_{2}}$ can be decomposed into a series of single-qubit gates as shown in Fig. 2

Decompose the two-qubit gate into a series of single-qubit gate sequences.

If $A \in {X, Y, Z}$ , then $I + α_{1} A_{1}$ can be realized by projection measurements on the A-base, and $I + i α_{2} A_{2}$ can be realized by a single qubit gate in rotation around the A-base, as derived in Ref.²¹.

Hamiltonian Monte Carlo sampling

Hamiltonian Monte Carlo, also known as hybrid Monte Carlo²², is a Markov chain Monte Carlo (MCMC) sampling algorithm based on Hamiltonian dynamics. It describes the motion of a given object at time $t$ using its position $x$ and momentum $p$ . Position corresponds to potential energy, while momentum corresponds to kinetic energy. Therefore, the potential energy $U (x)$ and kinetic energy $K (p)$ are functions of position and momentum, respectively. The Hamiltonian is the sum of the potential energy and kinetic energy functions. Thus, the Hamiltonian can be expressed as:

H (x, p) = U (x) + K (p)

The Hamiltonian characterizes the interconversion between kinetic energy and potential energy as an object moves. The following Hamiltonian equation can be analyzed quantitatively by differentiation.

\begin{matrix} \frac{\partial x_{i}}{\partial t} = \frac{\partial H}{\partial p_{i}} = \frac{\partial K (p)}{\partial p_{i}} \\ \frac{\partial p_{i}}{\partial t} = - \frac{\partial H}{\partial x_{i}} = - \frac{\partial U (x)}{\partial x_{i}} \end{matrix}

If it is possible to express $\partial U (x) / \partial x_{i}$ and $\partial K (p) / \partial p_{i}$ with some initial conditions (e.g., at time $t_{0}$ , initial position point $x_{0}$ , and initial momentum $p_{0}$ ), it becomes feasible to predict the object's position and momentum at any subsequent time instant $t = t_{0} + T$ .

The Hamiltonian equation captures the continuous time evolution of an object's motion. To numerically simulate Hamiltonian dynamics, it is essential to discretize time to obtain an approximate solution for the Hamiltonian equation. The time interval $T$ can be divided into smaller sub-intervals of length $δ$ , allowing for an approximate continuity of time. This process can be accomplished using the leapfrog algorithm²³, which sequentially updates the momentum and position variables. The algorithm proceeds by first calculating the momentum over a period, updating the object's position over a slightly extended period $δ,$ and finally completing the calculation of momentum for the next time interval. The algorithm follows the steps outlined below.

Begin by calculating the change in momentum after half the time interval $δ / 2 :$
$p_{i} (t + \frac{δ}{2}) = p_{i} (t) - \frac{δ}{2} \frac{\partial U}{\partial x_{i} (t)}$ 9
Next, compute the change in position over the entire time interval $δ$ :
$x_{i} (t + δ) = x_{i} (t) + δ \frac{\partial x}{\partial p_{i} (t + \frac{δ}{2})}$ 10
Recalculate the change in momentum for the remaining half-time $δ / 2$ :
$p_{i} (t + δ) = p_{i} (t + \frac{δ}{2}) - \frac{δ}{2} \frac{\partial}{\partial x_{i} (t + δ)}$ 11

The core idea of HMC is to construct the Hamiltonian function $H (x, p)$ . By leveraging this function, it becomes more efficient to explore the target distribution $P (x)$ . The canonical distribution of statistical mechanics can be used to relate $H (x, p)$ to $P (x)$ . Given the energy function of a state as $E (θ)$ , the corresponding canonical distribution can be defined as

P (θ) = \frac{1}{Z} e^{- E (θ)}

where Z is the regularization factor that ensures that $\int P (θ) d θ = 1$ , thus creating a valid probability distribution. Since the energy function is equal to the sum of potential and the kinetic energy in the system, it gives the following:

E (θ) = H (x, p) = U (x) + K (p)

Then the canonical function of the Hamiltonian kinetic energy function can be expressed as

P (x, p) \propto e^{- H (x, p)} \propto e^{- U (x)} e^{- K (p)} \propto P (x) P (p)

According to Eq. (14), we can decompose the joint distribution $P (x, p)$ into the product of the distributions of $P (x)$ and $P (p)$ , indicating that the two variables are independent of each other. As a result, their respective distributions can be utilized to sample their joint probability distributions. The introduction of an auxiliary variable $p$ can expedite the convergence of the Markov chain. Given that variables $x$ and $p$ are independent, the momentum variable $p$ can be sampled from any distribution, with $N (0, 1)$ commonly being selected. The function connected to the potential energy in the Hamiltonian is given by:

K (p) = \frac{p^{T} p}{2}

In HMC, after defining $K (p)$ , the remaining work is how to find the potential energy function $U (x)$ for a given target distribution $P (x)$ , then the potential energy function is usually defined as:

U (x) = - l o g P (x)

Calculate again the gradient function $G (x)$ of the potential energy function:

G (x) = \frac{\partial U (x)}{\partial x}

Next, Hamiltonian dynamics can be applied to MCMC to sample the objective function $P (x)$ . However, discretizing the time may introduce a specific error that may not match the target distribution, so the acceptance rate can be induced to offset the error, and the acceptance rate $α$ is

α = \min (1, \exp (- U (x_{L}) + U (x_{0}) - K (p_{L}) + K (p_{0})))

Here, $(x_{0}, p_{0})$ represents the initial state, while the new state $(x_{L}, p_{L})$ is obtained after executing the jump point algorithm $L$ times. A random point $u$ is chosen from a uniform distribution between 0 and 1. If the acceptance rate $α$ is greater than $u$ , the point $x_{L}$ is accepted in the Markov chain. Upon performing multiple samples to reach the burn-in period of HMC sampling, the Markov chain converges towards a stationary distribution, which corresponds to the target distribution $P (x)$ .

The random walk in the MCMC algorithm can lead the Markov chain to converge to a stationary distribution, $p (x)$ , but it is often considered inefficient. Hamiltonian Monte Carlo leverages the principles of Hamiltonian dynamics in physics to calculate the future states of the Markov chain, rather than relying solely on a probability distribution. This approach enables more efficient exploration of the state space and achieves faster convergence compared to random wandering.

Methods

Fast reconstruction algorithm

Traditional exact reconstruction algorithms involve traversing through all possible quantum states in the original circuit's state space to reconstruct its probability distribution. This requires processing all potential combinations of qubit strings through brute force computation. However, the time complexity of these algorithms grows exponentially. As the number of qubits in the circuit increases, reconstruction time also experiences an exponential explosion. Additionally, exact reconstruction algorithms may encounter difficulties when applied to large-scale quantum circuits due to overhead. To overcome these challenges, this paper presents a fast reconstruction algorithm that differs from the approximate reconstruction method proposed by Chen et al. based on MH sampling²⁰. In this work, we draw inspiration from MCMC sampling. However, our reconstruction algorithm deviates from the traditional MH approach, where the future state of a Markov chain is calculated through random wandering. Instead, our method employs Hamiltonian dynamics, rooted in the physical system concept, to determine future states. This technique enables more efficient analysis of the state space compared to random wandering and facilitates faster sampling of high probability solutions from state space, thus accelerating the convergence of Markov chains towards the original circuit's probability distribution. For a comprehensive understanding of HMC sampling, please refer to the references^24,25.

Our reconstruction algorithm follows the steps outlined below.

Step 1: Assume that the original circuit probability distribution is $P (x)$ , then customize the potential energy function $U (x) = - l o g P (x)$ , the gradient function $G (x) = \frac{\partial U (x)}{\partial x}$ , and the kinetic energy function $K (p) = \frac{p^{T} p}{2}$ .
Step 2: Once the potential, gradient, and kinetic energy functions have been initialized, we must establish an initial state $[x_{0}, p_{0}]$ for HMC sampling. Here, $x_{0}$ represents a randomly chosen quantum state from the original circuit's probability distribution, while $p_{0}$ typically denotes a random number drawn from a standard normal distribution $N (0, 1)$ . Additionally, it is essential to initialize a dictionary to store the original circuit's probability distribution.
Step 3: Based on the HMC sampling principle mentioned above, simulating Hamiltonian dynamics in numerical terms requires discretizing continuous time. This is typically achieved using the leapfrog algorithm. To obtain the new state $[x_{l}, p_{l}]$ , the initial state $[x_{0}, p_{0}]$ is iteratively updated L times using the leapfrog algorithm.
Step 4: Once the new state has been obtained, it is necessary to calculate the acceptance rate $α = m i n (1, e x p (- U (x_{l}) + U (x_{0}) - K (p_{l}) + K (p_{0})))$ for the state $[x_{l}, p_{l}]$ . This calculation is crucial because HMC sampling estimates the posterior distribution through probabilistic sampling. In essence, HMC sampling constructs a Markov chain that traverses the state space. The traversal is achieved by computing future states using the principles of dynamics in physical systems. To ensure that this Markov chain possesses the properties of a stationary distribution, it is necessary to define an acceptance rate $α$ for determining the viability of transitioning to future states. If the acceptance rate $α$ is greater than a threshold $u$ , the point is accepted as a value from the original circuit's probability distribution and stored in the dictionary for multiple iterations. Following the burn-in period of the algorithm, the Markov chain produced by HMC sampling stabilizes into the desired stationary distribution, which corresponds to the original circuit's probability distribution.

The pseudocode for our reconstruction algorithm can be obtained based on the description above.

Algorithm 1 — Fast reconstruction Algorithm.

Results

As the overhead of circuit reconstruction grows exponentially with the number of cuts, this paper also investigates how to reduce the minimum number of cuts required for circuit cutting, which can be achieved by virtual two-qubit gate decomposition.

The two-qubit gate decomposition technique can be applied to certain fully connected quantum circuits to reduce their structural complexity, resulting in fewer cuts and a significant reduction in the circuit's processing overhead. Figure 3 illustrates the change in the minimum number of required cuts before and after applying a two-qubit gate decomposition on a fully connected 5-qubit QFT²⁶ circuit.

Comparison of the change in the minimum number of cuts required before and after two-qubit gate decomposition for a 5-qubit QFT circuit.

If the circuit shown in Fig. 3a is cut directly, we need to cut it at least 4 times to decompose the circuit. We try to decompose the CP gate between qubit 0 and qubit 3 of the 5-qubit QFT circuit, the decomposition method is shown in Equation (6), and the decomposed circuit is shown in Fig. 3b. The CP-Cut in Fig. 3b indicates the gates after the CP gate is decomposed.

After applying the two-qubit decomposition principle to decompose the 5-qubit QFT circuit, it was found that only three cuts were needed to separate the circuit. Furthermore, we conducted additional experiments using the circuit-knitting-toolbox toolkit to explore the furthest two-qubit gate decomposition for varying qubit sizes in QFT circuits. The minimum number of cuts required before and after decomposing the two-qubit gate was then calculated. The experimental results are depicted in Fig. 4.

Comparison of the minimum number of cuts required for QFT circuit before and after decomposing a two-qubit gate.

Analysis of Fig. 4 reveals that decomposing two-qubit gates effectively reduces the number of cuts in 4-qubit to 21-qubit QFT circuits. This reduction is more prominent as the number of qubits in the QFT circuit increases. This trend can be extrapolated to complex quantum circuits, where decomposition of two-qubit gates serves as a strategy to minimize circuit cuts. Larger circuits benefit more from this technique, experiencing a greater reduction in the number of cuts after the decomposition of two-qubit gates. This paper evaluates the performance of our reconstruction algorithm through several experiments. Specifically, we measure the runtime of single cut reconstruction for randomized circuits with varying qubit sizes and compare it against three other algorithms: the traditional exact reconstruction algorithm, Tang's DD algorithm, and Chen's approximate reconstruction algorithm. In cases where a single cut is insufficient to cut the circuit, we apply two-qubit gate decomposition to minimize the number of required cuts. These experiments were conducted using the qasm simulator, and the corresponding results are illustrated in Fig. 5a.

Comparison of runtime and $MSE$ for different post-processing algorithms for a single cut of random circuits.

Figure 5a shows that the traditional exact reconstruction algorithm (Exact) shows remarkably short runtime for small qubit sizes. However, due to its requirement to traverse all quantum states, the runtime of Exact experiences exponential growth as the number of qubits in the quantum circuit increases. Conversely, the other three reconstruction algorithms demonstrate smoother time distributions in Fig. 5a and do not show significant fluctuations with increasing qubit counts. Among the three algorithms, the DD algorithm shows the longest average runtime due to its utilization of continuous recursion for merging active qubits into $bins$ , aiming to reconstruct solution states¹⁸. However, the time-consuming process of merging quantum states per recursion contributes to the relatively slower average runtime of the DD algorithm compared to both the fast reconstruction algorithm (FRA) and the approximate reconstruction algorithm (ARA) based on sampling. Notably, FRA outperforms ARA in terms of speed as it incorporates the concept of Hamiltonian dynamics in physical systems to compute future states within the Markov chain, in contrast to the random wandering approach employed by ARA.

The experimental results indicate that FRA outperforms other reconstruction algorithms in terms of runtime. Specifically, FRA's average runtime is 45.6 times faster than the traditional reconstruction algorithm, 2.32 times faster than the DD algorithm, and around 1.47 times faster than ARA.

In addition to the runtime analysis, this paper also compares the correctness of all quantum states of the probability distribution of the reconstruction results, and the evaluation metric used in this paper is the mean squared error ( $MSE$ ), as in Eq. (19):

M S E = \sum_{i} {(x_{i} - y_{i})}^{2}

Here, $x_{i}$ refers to the reconstructed result, while $y_{i}$ represents the result obtained from original circuit. A smaller $MSE$ corresponds to a lower error rate, as depicted in Fig. 5b, which presents a comparison of the $MSE$ values for different algorithms. Based on the experimental findings, the Exact shows the lowest error rate and closest proximity to the probability distribution of the original circuit. It is followed by the DD algorithm, the FRA proposed in this paper, and finally ARA.

The $MSE$ of FRA shows improvement compared to the $MSE$ of ARA in this experiment. However, there exists a disparity between traditional exact reconstruction and DD algorithms. This discrepancy arises due to the inherent sampling-based nature of FRA and ARA, which introduces a certain degree of error in contrast to alternative algorithms.

Quantum circuits can be broadly classified into two types. The first type of circuit results in a sparse probability distribution, where only a few solution states have a significantly high probability, while non-solution states have a probability of 0. Quantum algorithms like QAOA²⁷, Grover²⁸, and BV²⁹ generally belong to this type. On the other hand, the second type of circuit produces a dense probability distribution, where many quantum states have non-zero probabilities, such as the 2-D random circuits³⁰. In QAOA and similar algorithms, it is sufficient to highlight the solution state in the reconstructed probability distribution to obtain the algorithm's solution. There is no need to excessively focus on achieving high accuracy for all possible solutions. Consequently, we have tested the first type of circuit, specifically the QAOA circuit, as demonstrated below using an example to address the Max-Cut problem.

This paper makes a single cut to the 6-qubit QAOA circuit. This circuit is used to solve the Max-Cut problem of Fig. 6a, and the probability distribution reconstructed using FRA in this paper is experimentally compared with the results of run of the original circuit.

(a) is a diagram of the maximum cut problem to be solved, and (b) is a QAOA quantum circuit built by the objective function; we decompose the farthest CZ gate of the circuit and make one cut of the circuit.

The Max-Cut problem is a common combinatorial optimization problem in graph theory with important applications in statistical physics and circuit design. Michel Goemins and David Williamson proposed a classical algorithm based on semidefinite programming (SDP) approximation for solving the Max-Cut Problem in 1995, which is the best-known approximation algorithm for polynomial time³¹. The effectiveness of QAOA depends on the number of layers of the unitary transformation used, and in theory, it is possible to find an excellent approximation with enough layers, but it can also be time-consuming.

The Max-Cut problem involves dividing the nodes of a graph into two sets so that the number of edges between the sets is maximized. The objective function for its transformation into a combinatorial optimization problem can be formulated as follows:

C = \frac{1}{2} \sum_{i j \in E} (Z_{i} Z_{j} - I)

Assuming a max-cut is performed on the graph shown in Fig. 6a, we can then construct a quantum circuit graph illustrated in Fig. 6b based on the objective function of the Max-cut, represented by Eq. (20).

We optimize the parameters $β, γ$ ( $2 β$ is the rotation angle of the $Rzz$ gate, $2 γ$ is the rotation angle of the $Rx$ gate) by the classical $COBYLA$ optimizer. The changes in the circuit's expected value and variance are observed and presented in Fig. 7. In the expectation value measurement, a larger shaded region indicates a smaller value, indicating a better fit of the parameters and closer proximity of the circuit results to the approximate optimal solution. On the other hand, the variance represents the stability of the solution, where a larger shaded region corresponds to a smaller variance, indicating a more stable and reliable solution.

Figures (a) and (b) illustrate the expected value of the QAOA circuit and the variance of the measurements with the parameter $β$ , $γ$ for $p$ = 1 (where $p$ represents the number of iterations of the QAOA algorithm), respectively.

After finding the optimal parameters to construct the complete QAOA circuit in conjunction with Fig. 7, the farthest two-qubit gate $Rzz$ gate of the QAOA circuit is decomposed due to

R z z (θ) = e^{- i \frac{θ}{2} Z \otimes Z}

According to Eq. (6), the $Rzz$ gate can be decomposed into a series of $Z$ gates and a combination of $Rz$ gates. Subsequently, we divide the circuit into two parts by making a single cut and then reconstruct the circuit results using FRA. We conducted a comparative analysis of the results obtained from running the original circuit, the circuit after decomposing the two-qubit gate, and the reconstructed circuit after decomposing the two-qubit gate and performing a single cut. These results are visually represented in Fig. 8.

Results of the original circuit, the circuit after two-qubit gate decomposition, and results of running FRA on a single cut circuit after decomposing the two-qubit gate, with the probability values of some high probability solution states also marked in figure.

From Fig. 8, we see that the correct results of the circuit are |010101> and |101010>, and the correct solution probabilities of the circuit after decomposing the two-qubit gate are 0.13 and 0.12 respectively, which are lower than that of the original circuit at 0.15. However, the fast reconstruction algorithm is also able to reconstruct the high probability solution state based on the circuit after the two-qubit gate decomposition, with a correct solution probability of 0.19 and 0.18, which is even higher than that of the original circuit of 0.15. Although FRA does not fully reconstruct all quantum states within the state space of the original circuit, such as the low-probability solutions |011111> and others, this limitation results in a relatively low $MSE$ . However, FRA shows stability in reconstructing high probability solutions, which closely resemble the original distribution. This level of accuracy is sufficient for solving circuits like QAOA.

To further validate the conclusion, we performed additional experiments on multiple QAOA circuits. Each circuit consisted of two high probability solutions, which were then evaluated through FRA after a single cut. The experiment results were sorted in descending order of probability after running all circuits. Subsequently, the first two solutions from the original circuits' running results were compared to the first two solutions of the FRA reconstruction results. The evaluation metric used in this comparison was the coincidence rate ( $CR$ ).

C R = \{\begin{matrix} 0, & n o c o i n c i d e n t s o l u t i o n s \\ 0.5, & o n e s o l u t i o n i s c o i n c i d e n t \\ 1, & t h e f i r s t t w o s o l u t i o n s a r e c o i n c i d e n t \end{matrix})

The $CR$ is equal to 1 when both first two solutions coincide completely, 0.5 when only one solution coincides, and 0 when there are no coinciding solutions.

We conducted experiments on the QAOA circuits with 3-qubit to 13-qubit. As a result of the experiments, the $CR$ of the reconstructed results of the FRA for all experimental circuits is 1, when compared to the results obtained from the original circuits. This signifies that FRA is capable of accurately reconstructing the solution states for quantum circuits, such as QAOA. Additionally, we have counted the number of results obtained by various reconstruction algorithms in the experiments to estimate the space overhead, as depicted in Fig. 9.

The Figure shows the number of reconstructed results obtained through various algorithms, namely Exact, DD, ARA, and FRA, for the 3-qubit to 13-qubit qubits QAOA experiment.

As shown in Fig. 9, of all the reconstruction algorithms for this experiment, the FRA reconstruction yields the least number of solutions. The Exact and DD algorithms reconstruct all quantum states within the state space of the circuit. However, the results obtained by these algorithms increase exponentially as the number of qubits grows. At a certain point, the storage device may be incapable of accommodating all the results. Conversely, FRA reconstructs high probability solutions from the original circuit, eliminating the presence of 0-probability or low-probability solutions. Consequently, the storage space overhead can be significantly reduced, transforming from exponential to polynomial scale when compared to Exact and DD algorithm for large-scale quantum circuits.

In summary, FRA can efficiently acquire the reconstruction probability distribution of a quantum circuit in a shorter runtime compared to the Exact, DD, and ARA. Unlike the Exact and DD algorithm, FRA does not require the reconstruction of probabilities for all quantum states. Instead, it selectively focuses on high probability solutions, reducing time overhead and space overhead while finding the correct solution for the circuit. This means that FRA is a low overhead circuit-cutting post-processing algorithm.

Conclusion

This paper introduces an algorithm based on Hamiltonian Monte Carlo in circuit cutting reconstruction. Additionally, we apply two-qubit gate decomposition to minimize the number of cuts in experimental circuits during our experiments. Our comprehensive experiments demonstrate that our reconstruction algorithm efficiently samples high probability solutions from state space, without the need to traverse all possible states. Furthermore, it significantly reduces both time and space overheads.

In the NISQ era, circuit cutting plays a crucial role in extending NISQ devices to larger scales. However, due to its excessive post-processing overhead, operating on large-scale quantum circuits becomes impractical. Therefore, the algorithm proposed in this paper to reduce the post-processing overhead of circuit cutting holds the utmost importance for the NISQ era. Additionally, the concept of the fast reconstruction algorithm, introduced in this paper, solely obtaining high probability solutions carries significant implications for the future development of quantum computing.

Currently, our reconstruction algorithm is only applicable to circuits with a single cut, and it has not been extended to circuits with multiple cuts. Our reconstruction algorithm is particularly suitable for circuits such as QAOA. Investigating and exploring how the algorithm can be adapted for multiple cuts and extended to all quantum circuits are promising directions for future research.

Author contributions

Material preparation, data collection and analysis were performed by H.L., J.X. and Y.Z. The first draft of the manuscript was written by H.L. and Z.S. Y.L. and Z.F. prepared the figures. All authors contributed to the study conception and design.

Data availability

The data presented in this paper is available online at https://github.com/hang-dev/FRA.

Code availability

The code for the specific implementation of our method is available online at https://github.com/hang-dev/FRA.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Shalf JM, Leland R. Computing beyond Moore's law. Computer. 2015;48:14–23. doi: 10.1109/MC.2015.374. [DOI] [Google Scholar]
2.Biamonte J, et al. Quantum machine learning. Quantum. 2017;549:195–202. doi: 10.1038/nature23474. [DOI] [PubMed] [Google Scholar]
3.Lloyd S, Mohseni M, Rebentrost PJ. Quantum Algorithms for Supervised and Unsupervised Machine Learning. Springer; 2013. [Google Scholar]
4.Lanyon BP, et al. Towards quantum chemistry on a quantum computer. Quantum. 2010;2:106–111. doi: 10.1038/nchem.483. [DOI] [PubMed] [Google Scholar]
5.Abrams DS, Lloyd SJ. Simulation of many-body Fermi systems on a universal quantum computer. Phys. Rev. Lett. 1997;79:2586. doi: 10.1103/PhysRevLett.79.2586. [DOI] [Google Scholar]
6.Shor PW. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 1999;41:303–332. doi: 10.1137/S0036144598347011. [DOI] [Google Scholar]
7.Bouland, A., van Dam, W., Joorati, H., Kerenidis, I. & Prakash, A. Prospects and Challenges of Quantum Finance. (2020).
8.Lee RS, Lee RS. Future trends in quantum finance. Quant. Financ. 2020;1:399–405. doi: 10.1007/978-981-32-9796-8_14. [DOI] [Google Scholar]
9.Montanaro A. Quantum algorithms: An overview. NPJ Quant. Inf. 2016;2:1–8. [Google Scholar]
10.Preskill JJQ. Quantum computing in the NISQ era and beyond. Quantum. 2018;2:79. doi: 10.22331/q-2018-08-06-79. [DOI] [Google Scholar]
11.Bharti K, et al. Noisy intermediate-scale quantum algorithms. Quantum. 2022;94:015004. [Google Scholar]
12.Bravyi S, Smith G, Smolin JA. Trading classical and quantum computational resources. Phys. Rev. X. 2016;6:021043. [Google Scholar]
13.Eddins A, et al. Doubling the size of quantum simulators by entanglement forging. PRX Quant. 2022;3:010309. doi: 10.1103/PRXQuantum.3.010309. [DOI] [Google Scholar]
14.Peng T, Harrow AW, Ozols M, Wu XJ. Simulating large quantum circuits on a small quantum computer. Phys. Rev. Lett. 2020;125:150504. doi: 10.1103/PhysRevLett.125.150504. [DOI] [PubMed] [Google Scholar]
15.Perlin MA, Saleem ZH, Suchara M, Osborn JC. Quantum circuit cutting with maximum-likelihood tomography. NPJ Quant. Inf. 2021;7:64. doi: 10.1038/s41534-021-00390-6. [DOI] [Google Scholar]
16.Piveteau, C. & Sutter, D. J. Circuit Knitting with Classical Communication. (2022).
17.Saleem, Z. H., Tomesh, T., Perlin, M. A., Gokhale, P. & Suchara, M. J. Quantum Divide and Conquer for Combinatorial Optimization and Distributed Computing. (2021).
18.Tang, W., Tomesh, T., Suchara, M., Larson, J. & Martonosi, M. in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 473–486.
19.Lowe A, et al. Fast quantum circuit cutting with randomized measurements. Quantum. 2023;7:934. doi: 10.22331/q-2023-03-02-934. [DOI] [Google Scholar]
20.Chen, D. et al. in 2022 IEEE International Conference on Quantum Computing and Engineering (QCE), 509–515 (IEEE).
21.Mitarai K, Fujii KJN. Constructing a virtual two-qubit gate by sampling single-qubit operations. New J. Phys. 2021;23:023021. doi: 10.1088/1367-2630/abd7bc. [DOI] [Google Scholar]
22.Neal RM. MCMC using Hamiltonian dynamics. Handb. Markov Chain Monte Carlo. 2011;2:2. [Google Scholar]
23.Calvo MP, Sanz-Alonso D, Sanz-Serna JM. HMC: Reducing the number of rejections by not using leapfrog and some results on the acceptance rate. J. Comput. Phys. 2021;437:110333. doi: 10.1016/j.jcp.2021.110333. [DOI] [Google Scholar]
24.Betancourt, M. J. A Conceptual Introduction to Hamiltonian Monte Carlo. (2017).
25.Duane S, Kennedy AD, Pendleton BJ, Roweth D. Hybrid Monte Carlo. Phys. Lett. B. 1987;195:216–222. doi: 10.1016/0370-2693(87)91197-X. [DOI] [Google Scholar]
26.Cooley JW, Tukey JWJ. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965;19(90):297–301. doi: 10.1090/S0025-5718-1965-0178586-1. [DOI] [Google Scholar]
27.Farhi, E., Goldstone, J. & Gutmann, S. J. A Quantum Approximate Optimization Algorithm. ArXiv (2014).
28.Grover, L. K. in Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, 212–219.
29.Bernstein, E. & Vazirani, U. in Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, 11–20.
30.Boixo S, et al. Characterizing quantum supremacy in near-term devices. Nat. Phys. 2018;14:595–600. doi: 10.1038/s41567-018-0124-x. [DOI] [Google Scholar]
31.Goemans MX, Williamson DPJ. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM. 1995;42:1115–1145. doi: 10.1145/227683.227684. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data presented in this paper is available online at https://github.com/hang-dev/FRA.

The code for the specific implementation of our method is available online at https://github.com/hang-dev/FRA.

[CR1] 1.Shalf JM, Leland R. Computing beyond Moore's law. Computer. 2015;48:14–23. doi: 10.1109/MC.2015.374. [DOI] [Google Scholar]

[CR2] 2.Biamonte J, et al. Quantum machine learning. Quantum. 2017;549:195–202. doi: 10.1038/nature23474. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Lloyd S, Mohseni M, Rebentrost PJ. Quantum Algorithms for Supervised and Unsupervised Machine Learning. Springer; 2013. [Google Scholar]

[CR4] 4.Lanyon BP, et al. Towards quantum chemistry on a quantum computer. Quantum. 2010;2:106–111. doi: 10.1038/nchem.483. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Abrams DS, Lloyd SJ. Simulation of many-body Fermi systems on a universal quantum computer. Phys. Rev. Lett. 1997;79:2586. doi: 10.1103/PhysRevLett.79.2586. [DOI] [Google Scholar]

[CR6] 6.Shor PW. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 1999;41:303–332. doi: 10.1137/S0036144598347011. [DOI] [Google Scholar]

[CR7] 7.Bouland, A., van Dam, W., Joorati, H., Kerenidis, I. & Prakash, A. Prospects and Challenges of Quantum Finance. (2020).

[CR8] 8.Lee RS, Lee RS. Future trends in quantum finance. Quant. Financ. 2020;1:399–405. doi: 10.1007/978-981-32-9796-8_14. [DOI] [Google Scholar]

[CR9] 9.Montanaro A. Quantum algorithms: An overview. NPJ Quant. Inf. 2016;2:1–8. [Google Scholar]

[CR10] 10.Preskill JJQ. Quantum computing in the NISQ era and beyond. Quantum. 2018;2:79. doi: 10.22331/q-2018-08-06-79. [DOI] [Google Scholar]

[CR11] 11.Bharti K, et al. Noisy intermediate-scale quantum algorithms. Quantum. 2022;94:015004. [Google Scholar]

[CR12] 12.Bravyi S, Smith G, Smolin JA. Trading classical and quantum computational resources. Phys. Rev. X. 2016;6:021043. [Google Scholar]

[CR13] 13.Eddins A, et al. Doubling the size of quantum simulators by entanglement forging. PRX Quant. 2022;3:010309. doi: 10.1103/PRXQuantum.3.010309. [DOI] [Google Scholar]

[CR14] 14.Peng T, Harrow AW, Ozols M, Wu XJ. Simulating large quantum circuits on a small quantum computer. Phys. Rev. Lett. 2020;125:150504. doi: 10.1103/PhysRevLett.125.150504. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Perlin MA, Saleem ZH, Suchara M, Osborn JC. Quantum circuit cutting with maximum-likelihood tomography. NPJ Quant. Inf. 2021;7:64. doi: 10.1038/s41534-021-00390-6. [DOI] [Google Scholar]

[CR16] 16.Piveteau, C. & Sutter, D. J. Circuit Knitting with Classical Communication. (2022).

[CR17] 17.Saleem, Z. H., Tomesh, T., Perlin, M. A., Gokhale, P. & Suchara, M. J. Quantum Divide and Conquer for Combinatorial Optimization and Distributed Computing. (2021).

[CR18] 18.Tang, W., Tomesh, T., Suchara, M., Larson, J. & Martonosi, M. in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 473–486.

[CR19] 19.Lowe A, et al. Fast quantum circuit cutting with randomized measurements. Quantum. 2023;7:934. doi: 10.22331/q-2023-03-02-934. [DOI] [Google Scholar]

[CR20] 20.Chen, D. et al. in 2022 IEEE International Conference on Quantum Computing and Engineering (QCE), 509–515 (IEEE).

[CR21] 21.Mitarai K, Fujii KJN. Constructing a virtual two-qubit gate by sampling single-qubit operations. New J. Phys. 2021;23:023021. doi: 10.1088/1367-2630/abd7bc. [DOI] [Google Scholar]

[CR22] 22.Neal RM. MCMC using Hamiltonian dynamics. Handb. Markov Chain Monte Carlo. 2011;2:2. [Google Scholar]

[CR23] 23.Calvo MP, Sanz-Alonso D, Sanz-Serna JM. HMC: Reducing the number of rejections by not using leapfrog and some results on the acceptance rate. J. Comput. Phys. 2021;437:110333. doi: 10.1016/j.jcp.2021.110333. [DOI] [Google Scholar]

[CR24] 24.Betancourt, M. J. A Conceptual Introduction to Hamiltonian Monte Carlo. (2017).

[CR25] 25.Duane S, Kennedy AD, Pendleton BJ, Roweth D. Hybrid Monte Carlo. Phys. Lett. B. 1987;195:216–222. doi: 10.1016/0370-2693(87)91197-X. [DOI] [Google Scholar]

[CR26] 26.Cooley JW, Tukey JWJ. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965;19(90):297–301. doi: 10.1090/S0025-5718-1965-0178586-1. [DOI] [Google Scholar]

[CR27] 27.Farhi, E., Goldstone, J. & Gutmann, S. J. A Quantum Approximate Optimization Algorithm. ArXiv (2014).

[CR28] 28.Grover, L. K. in Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, 212–219.

[CR29] 29.Bernstein, E. & Vazirani, U. in Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, 11–20.

[CR30] 30.Boixo S, et al. Characterizing quantum supremacy in near-term devices. Nat. Phys. 2018;14:595–600. doi: 10.1038/s41567-018-0124-x. [DOI] [Google Scholar]

[CR31] 31.Goemans MX, Williamson DPJ. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM. 1995;42:1115–1145. doi: 10.1145/227683.227684. [DOI] [Google Scholar]

PERMALINK

Fast reconstruction algorithm based on HMC sampling

Hang Lian

Jinchen Xu

Yu Zhu

Zhiqiang Fan

Yi Liu

Zheng Shan

Abstract

Introduction

Background

Circuit cutting

Figure 1.

Virtual two-qubit gate

Figure 2.

Hamiltonian Monte Carlo sampling

Methods

Fast reconstruction algorithm

Algorithm 1.

Results

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Conclusion

Author contributions

Data availability

Code availability

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases