Characterizing barren plateaus in quantum ansätze with the adjoint representation

Enrico Fontana; Dylan Herman; Shouvanik Chakrabarti; Niraj Kumar; Romina Yalovetzky; Jamie Heredge; Shree Hari Sureshbabu; Marco Pistoia

doi:10.1038/s41467-024-49910-w

. 2024 Aug 22;15:7171. doi: 10.1038/s41467-024-49910-w

Characterizing barren plateaus in quantum ansätze with the adjoint representation

Enrico Fontana ^1,², Dylan Herman ^1,^✉, Shouvanik Chakrabarti ¹, Niraj Kumar ¹, Romina Yalovetzky ¹, Jamie Heredge ^1,³, Shree Hari Sureshbabu ¹, Marco Pistoia ¹

PMCID: PMC11341719 PMID: 39174510

Abstract

Variational quantum algorithms, a popular heuristic for near-term quantum computers, utilize parameterized quantum circuits which naturally express Lie groups. It has been postulated that many properties of variational quantum algorithms can be understood by studying their corresponding groups, chief among them the presence of vanishing gradients or barren plateaus, but a theoretical derivation has been lacking. Using tools from the representation theory of compact Lie groups, we formulate a theory of barren plateaus for parameterized quantum circuits whose observables lie in their dynamical Lie algebra, covering a large variety of commonly used ansätze such as the Hamiltonian Variational Ansatz, Quantum Alternating Operator Ansatz, and many equivariant quantum neural networks. Our theory provides, for the first time, the ability to compute the exact variance of the gradient of the cost function of the quantum compound ansatz, under mixing conditions that we prove are commonplace.

Subject terms: Quantum information, Qubits

Previous studies have highlighted the role of quantum circuit symmetries in ameliorating the barren plateau problem, but rigorous analytical results on this topic are scarce. Here, by looking at Lie algebra supported ansätze, the authors find an expression for the variance of the cost function gradient as a function of the dimension of the dynamical Lie algebra.

Introduction

Variational quantum algorithms (VQAs) are a popular class of quantum computing heuristics due to their low circuit cost and ability to be trained in a hybrid quantum-classical fashion¹. The community has identified a variety of potential applications for VQAs in the areas of optimization^2–7 and machine learning^8–12. Unfortunately, the optimization of VQAs can be a computationally challenging task due to (1) exponentially many parameters being required to ensure convergence^13–17, and (2) exponentially many samples being required to estimate gradients, known as the barren plateau (BP) problem^18–21. In some cases, it has been observed numerically that both of these obstacles to VQA optimization can be mitigated when the chosen parameterized quantum circuit (PQC) obeys certain symmetries^14,22. The symmetries of the ansatz cause its action, in either the Schrödinger or Heisenberg pictures, to break into invariant subspaces. However, there have only been a few cases in which potentially useful symmetries, mostly in the Schrödinger picture, have been identified for analyzing BPs, e.g. permutation invariant subspaces²³. Previously, symmetries have been leveraged for efficient classical simulation of quantum circuits in both the Schrödinger²⁴ and Heisenberg^25–27 pictures. The simulation is performed separately in each invariant subspace defined by the symmetries by projecting the states or operators accordingly.

The existing theoretical results on the trainability and convergence of ansätze with symmetries have been restricted to the Schrödinger picture and a setting called subspace controllable^14,18,22,23. Subspace controllability occurs when the circuit can express any unitary transformation between states in an invariant subspace, and it has been observed that it results in training landscapes that are essentially trap-free^28,29. In addition, if the invariant subspaces have small dimension, i.e., scale polynomially in system size, it can be easily shown that BPs are not present for subspace-controllable PQCs.

These results, however, fail in the uncontrollable setting, where the circuit is limited to expressing a subgroup of the unitary group in the invariant subspace. With respect to the BPs problem, existing work has observed a desirable feature of subspace uncontrollable circuits²². In this setting, it appears that the trainability of the ansatz depends on the dimension of the dynamical Lie algebra (DLA), which holds almost trivially in the subspace-controllable setting since the DLA dimension grows with the square of the subspace dimension. However, existing work has only provided evidence of this connection to the DLA dimension numerically in the uncontrollable setting²². There are cases where, for uncontrollable PQCs, the dimension of the effective DLA only grows polynomially in the system size, while the invariant subspace dimension where the initial state lies is exponentially growing, such as the quantum compound ansatz^30,31. Note that the effective DLA is the restriction of the action of the DLA to an invariant subspace. Thus, this connection between the DLA dimension and BPs has remained unproven in the general setting.

In this work, using a simple but powerful observation regarding the adjoint representation and the representation theory of compact Lie groups, we prove that for a general class of PQCs the variance of the gradient of the cost function does fall inversely with the dimension of the effective DLA for 2-designs of the dynamical Lie group. As we will show, the Heisenberg picture and the symmetries of the circuit’s action on the observable are more suitable for explaining this phenomenon. This will lead to intuitive and commonplace conditions on the observable that are sufficient for this connection to hold. To show the validity of the 2-design assumption in practice, we show that fast mixing occurs for DLAs with polynomial dimensions, and we experimentally verify our formulae for the quantum compound ansatz.

Results

General framework

VQAs consist of optimizing the parameters of parameterized circuits of the form given in the below definition:

Definition 2.1

(Periodic ansatz) A periodic ansatz constructed from Hermitian generators ${{\tilde{H}}_{1}, \dots, {\tilde{H}}_{K}}$ consists of a unitary of the form

U (θ) = \prod_{l = 1}^{L} \prod_{k = 1}^{K} e^{- θ_{(l, k)} i {\tilde{H}}_{k}},

an initial state $ρ = U_{0} ∣0⟩ ⟨0∣ U_{0}^{†}$ , and a Hermitian measurement operator O.

The output of a VQA is the parameter-dependent expectation value ${⟨ O ⟩}_{ρ} = Tr {U (θ) ρ U^{†} (θ) O}$ , known as the cost function.

For n-qubits, the set of U(θ) lies in the unique connected subgroup of SU(2ⁿ), called the dynamical Lie group³². It is the subgroup associated with the real span of the Lie closure (i.e., closure under taking commutators) of the generators:

g : = {span}_{R} {⟨ i {\tilde{H}}_{1}, \dots, i {\tilde{H}}_{K} ⟩}_{Lie},

which is known in the quantum control literature as the DLA³². We denote the dimension of $g$ as a real vector space by $d_{g}$ .

We also informally define the notion of BP for quantum ansätze.

Definition 2.2

(Barren plateau) A class of quantum ansätze experiences a BP if the variance of the cost function gradient decays exponentially with system size, i.e., for all (l, k),

{Var}_{θ ~ ν} [\partial_{(l, k)} {⟨ O ⟩}_{ρ}] \in O (\frac{1}{b^{n}}),

where the system size n is the number of qubits and b > 1. Typically ν is the uniform distribution over the range of the parameters.

Note that in general a BP at initialization may not imply a BP throughout the training trajectory. However, in most cases when ν is the uniform distribution over parameters, the collection U(θ) forms an approximate 2-design w.r.t. the Haar measure on the dynamical Lie group (this is made explicit in a later subsection), and due to Haar invariance, a BP at initialization implies a BP throughout training. A PQC that experiences a BP is also called untrainable, which follows from the gradient being computationally infeasible to estimate to arbitrary precision. Otherwise, if the variance only falls as $Ω (1 / poly (n))$ , then the PQC is trainable.

DLA - BP connection

It has been conjectured that the dimension of the DLA plays a crucial role in characterizing the trainability of VQAs. More specifically, the following conjecture linking trainability and DLA dimension was put forward:

Conjecture 2.3

(Conjecture 1 in ref. ²², paraphrased) The scaling of the variance of the partial derivatives of the cost function is inversely proportional to the dimension of the DLA:

{Var}_{θ ~ ν} [\partial_{(l, k)} {⟨ O ⟩}_{ρ}] \in O (\frac{1}{poly (d_{g})}) .

In this work, we provide a proof of this conjecture. We emphasize that our results show a more explicit scaling of the variance with the DLA dimension, instead of just an upper bound. Thus, our results shed light on when stronger versions of the above conjecture hold, e.g., $Θ (\frac{1}{poly (d_{g})})$ . However, this depends on the initial state and observable, since the DLA dimension may not always be the quantity dominating the decay.

It turns out that the connection holds for a certain class of ansätze, which we term the class of Lie algebra supported ansatz (LASA).

Definition 2.4

(Lie algebra supported ansatz) A Lie algebra supported ansatz (LASA) is a periodic ansatz where the measurement operator O is such that iO belongs to the dynamical Lie algebra associated with the circuit generators ${i {\tilde{H}}_{1}, \dots, i {\tilde{H}}_{K}}$ .

In Fig. 1, we display our main result, which shows that the variance of the gradient has a direct dependence on DLA dimension for LASAs. As will be made rigorous later, by construction, the action of a LASA on its observable will decompose into invariant subspaces (corresponding to preserved symmetries) each of dimension at most $d_{g}$ .

While we introduce restrictions on the observable, we note that our results are still far-reaching. This is because LASAs include many commonly used PQCs such as the Hamiltonian variational ansatz (HVA)³³ and quantum alternating operator ansatz (QAOA)^2,34. We also note that all LASAs are equivariant quantum neural networks (EQNNs)³⁵. However, an EQNN is not necessarily a LASA, since there are equivariant operators that may not lie in the DLA. This is because equivariance is defined with respect to a symmetry group of the quantum data, and one could imagine a situation in which the circuit has a small DLA such that other equivariant operators exist outside the DLA.

Representation theoretic notation

The following presents the notation used throughout the paper and assumes familiarity with Lie groups and representation theory. The unfamiliar reader is directed to the Supplementary Information, where we briefly introduce Lie groups and representation theory.

Our focus will be a compact, connected Lie group G. The corresponding compact Lie algebra will be denoted by $g$ . The notation V will represent an arbitrary finite-dimensional inner product space over $C$ or $R$ . If a result does not specify which field is used, then either can be assumed. Additionally, $U (V)$ will denote the group of isometries on V (i.e., depending on the field, either the unitary group or orthogonal group), and $u (V)$ will denote the set of skew-Hermitian operators on V. For either $R$ or $C$ , we will use $ϕ : G \to U (V)$ to denote a unitary representation of the group G and $d ϕ : g \to u (V)$ to denote the differential or Lie algebra representation. We will frequently use the notation U_g to denote the element $ϕ (g) \in U (V)$ for some g ∈ G when the representation ϕ and space V are clear from the context.

Recall that the adjoint representation of a Lie group G is the homomorphism:

\forall g \in G, {Ad}_{g} (k) : = g k g^{- 1} \in g, \forall k \in g,

and the adjoint representaton of a Lie algebra is the homomorphism:

\forall h \in g, {ad}_{h} (k) : = [h, k] \in g, \forall k \in g .

For compact simple Lie algebras, since all trace forms are related by a real factor, we define a scaling constant I_ϕ that we call the index of the representation (w.r.t. the standard representation) such that:

- Tr (d ϕ (e_{i}) d ϕ (e_{j})) = I_{ϕ} δ_{i j},

for {e_i} a basis for $g$ satisfying:

- Tr (e_{i} e_{j}) = δ_{i j} .

The constant I_ϕ is the same as (twice) the Dynkin index for irreducible representations³⁶.

For compact simple Lie algebras we consider a few norms induced by the trace forms. For any $a \in g$ , we define the standard norm to be

∥ a ∥_{g}^{2} = - Tr (a^{2}),

the Killing norm $∥ a ∥_{K}^{2}$ to be the norm induced by the Killing form (trace form associated with the adjoint representation), and more generally, for an arbitrary Lie algebra representation dϕ, we denote the usual Frobenius norm by $∥ d ϕ (a) ∥_{F}^{2}$ . All are related in the natural way via the associated index of the representation, as defined earlier. Specifically, for arbitrary dϕ:

∥ d ϕ (a) ∥_{F} = I_{ϕ} ∥ a ∥_{g}^{2}

∥ d ϕ (a) ∥_{K}^{2} = ∥ a ∥_{K}^{2} = I_{Ad} ∥ a ∥_{g}^{2} = \frac{I_{Ad}}{I_{ϕ}} ∥ d ϕ (a) ∥_{F}^{2} .

For an arbitrary $X \in u (V)$ , we define $X_{g}$ to be the orthogonal projection under the Frobenius inner product onto $d ϕ (g)$ .

Lastly, throughout the paper, all integration, e.g. ∫_G f(g)dg, is with respect to the Haar measure μ for G. The notation μ^⊗2 will denote the product Haar measure.

Let us now place these notions in the context of VQAs. The vector space V on which the group acts is the n-qubit Hilbert space $C^{2^{n}}$ . In general the PQC’s dynamical Lie group will be ϕ(G) with ϕ a faithful (injective) representation and this is what we will assume here. In practice however one always can take ϕ to be the identity map, identifying G with the dynamical group and $g$ with the DLA, without invalidating the results.

In this abstract setting there is no notion of parameter space and hence the PQC gradient ∂_{(l, k)}〈O〉_ρ is not well defined. Thus, we introduce the following parameter-independent quantity associated with any compact, connected Lie group:

Definition 2.5

(Abstracted gradient) Let G be a compact, connected Lie group with representation $ϕ : G \to U (V)$ . In addition, let $h \in g$ and $i O, i A \in u (V)$ . We define the abstracted gradient to be the following quantity:

\partial {⟨ O ⟩}_{A} : = Tr {U_{g^{-}}^{†} A U_{g^{-}} [H, U_{g^{+}} O U_{g^{+}}^{†}]},

where $U_{g^{\pm}} : = ϕ (g^{\pm})$ for arbitrary g⁺, g⁻ ∈ G, and H = dϕ(h).

Note that now we set the generators to be skew-Hermitian. The connection between abstracted and PQC gradients is clear for the periodic ansatz in Definition 2.1: for any parameter θ_{(l, k)} the PQC gradient will be equivalent to an abstracted gradient, with $U_{g^{-}}$ ( $U_{g^{+}}$ ) being the unitaries preceding (following) the unitary $e^{- θ_{(l, k)} H_{k}}$ in the circuit.

In our calculations we will look at second moments of the abstracted gradient for (g⁺, g⁻) ~ μ^⊗2. This will accurately model the experimental behavior if for any θ_{(l, k)} the ansatz takes the form $W^{(L)} e^{- θ_{(l, k)} H_{k}} W^{(R)}$ with W^(L/R) random unitaries forming independent 2-designs for ϕ(G).

For a sufficiently deep periodic ansatz, the assumption is valid for parameters in the middle of the PQC whenever randomly initialized, polynomially-sized periodic ansätze form approximate 2-designs.

It has been shown that this holds for $g = su (2^{n})$ or $so (2^{n})$ and when all generators are in the Pauli group³⁷. It has been widely assumed in literature that this result still holds for ansatz with different DLAs, with only numerical evidence. The following result answers this in the affirmative for LASA with polynomially-sized DLA, showing that rapid mixing to 2-design still holds when we sample generators from a basis for the DLA.

Theorem 2.6

(Rapid mixing for polynomial DLA) Consider an orthogonal basis of skew-Hermitian generators $A : = {B_{1}, \dots, B_{d_{g}}}$ for the DLA with the property that the unitary $e^{- θ B_{k}}$ corresponding to a generator B_k is t_k-periodic. In addition, suppose that $d_{g} = O (poly (n))$ . Consider a LASA formed by applying evolutions $e^{- θ_{k} B_{k}}$ where B_k is selected uniformly at random from the set $A$ and the parameter θ_k uniformly from [0, t_k). Then, the ansatz is an ϵ-approximate 2-design for the dynamical group $G$ after $O (poly (n) \log (1 / ϵ))$ layers.

Note that this result only focuses on bounding the spectral gap of the walk, i.e., a “layer” is a single application of evolution, and does not include the cost of implementing the evolutions $e^{- θ B_{k}}$ in terms of basis gates. The proof of the above result and its generalization to t-designs for arbitrary LASA are in the Supplementary Information and are based on techniques used by ref. ³⁷ and earlier works. Such random walks have been known to converge for some time³⁸, and convergence to Haar for exponential DLA is not efficient. However, the above result makes the spectral gap dependence explicit. As a comparison, $O (\log (1 / ϵ))$ layers suffice for random Pauli rotations to approximate a 2-design for SU(2ⁿ), as shown by ref. ³⁷. The approach of studying BPs with 2-designs is standard, e.g. see ref. ¹⁸. Furthermore, as we have shown, it is theoretically motivated in the case of independent, uniformly distributed parameters. However, there may still be settings where the 2-design assumption fails and where our results will not hold, for example, other initialization schemes or correlated parameters. Interestingly, there is evidence that both may avoid BPs^39,40, however they do not investigate this research direction further.

Inspired by our overall goal of analyzing BPs in PQCs, we seek to compute the quantity

GradVar : = {Var}_{(g^{+}, g^{-}) ~ μ^{\otimes 2}} [\partial {⟨ O ⟩}_{ρ}] = E_{(g^{+}, g^{-}) ~ μ^{\otimes 2}} [{(\partial {⟨ O ⟩}_{ρ})}^{2}] - {(E_{(g^{+}, g^{-}) ~ μ^{\otimes 2}} [\partial {⟨ O ⟩}_{ρ}])}^{2},

where μ is the unique Haar measure over G and ρ is the initial quantum state to which all elements of the dynamical group are applied. $E_{(g^{+}, g^{-}) ~ μ^{\otimes 2}} [\partial {⟨ O ⟩}_{ρ}]$ can be shown to be zero is general (see the Supplementary Information), and thus in practice, we focus on the second moment:

GradVar = E_{(g^{+}, g^{-}) ~ μ^{\otimes 2}} [{(\partial {⟨ O ⟩}_{ρ})}^{2}] .

Using Definition 2.2, a BP occurs when the following holds:

GradVar \in O (\frac{1}{b^{n}}), b > 1 .

This is the phenomenon that our methods will seek to probe for the specific case of LASAs.

Lastly, we formally define what we mean by symmetries in the Schrödinger and Heisenberg evolution pictures. Schrödinger symmetries refer to invariant linear subspaces V_s of states that are preserved by evolutions generated by the dynamical Lie group $G$ , i.e., $\forall U \in G, U V_{s} \subseteq V_{s}$ . Heisenberg symmetries refer to invariant linear subspaces of observables V_h preserved by evolutions generated by having the dynamical Lie group $G$ act via conjugation, i.e., $\forall U \in G, U V_{h} U^{†} \subseteq V_{h}$ . If the observables lie in the DLA, i.e. the LASA case, then this is the adjoint representation.

Theory of BPs for LASA

We now present our theoretical contributions, which connect the Lie algebra dimension to the scaling of the gradient variance. We note that norms involving the Hermitian observable O and the skew-Hermitian generator H have a few interpretations as mentioned in the previous subsection. However, to be concise and for readability, we present the results in only one form.

We start by recalling that all compact Lie algebras (and thus groups) are reductive.

Definition 2.7

(Reductive Lie algebra⁴¹) A Lie algebra $g$ is reductive if the adjoint representation is completely reducible, i.e., $g$ has the following decomposition as a direct sum of Lie algebras:

g = ⨁_{α} g_{α} \oplus c,

where each $g_{α} \subset g$ is a simple ideal and $c \subset g$ is the center of $g$ . Note that if G is simply connected then $c = {0}$ .

This property is essential for proving our main result, as it allows us to extend our expression (Theorem 2.8) for the gradient variance for simple Lie groups to the general compact case. If $g$ is compact, then the $g_{α}$ will be compact as well⁴².

Note that this notion of reducibility is related to what has appeared in prior works, e.g., refs. ^22,23,30, the differences are mainly as to whether the group acts on the observable or state. We discuss this in detail in the last subsection.

Next, we present our expression for the variance of the gradient for compact simple groups that applies to each $g_{α}$ in Equation (16).

Theorem 2.8

(Simple group variance) Let G be a compact, connected simple Lie group with Lie algebra $g$ . Suppose ϕ is a finite-dimensional unitary representation of G. In addition, $o, h \in g$ , iO = dϕ(o), H = dϕ(h) and ρ a density matrix. Then the following holds:

GradVar = \frac{∥ H ∥_{K}^{2} ∥ O ∥_{F}^{2} ∥ ρ_{g} ∥_{F}^{2}}{d_{g}^{2}} .

If G is compact, one can use the fact that it is reductive and apply Theorem 2.8 to each of the compact simple ideals to obtain the following:

Theorem 2.9

(Compact group variance) Let G be a compact and connected Lie group with Lie algebra $g$ . Suppose ϕ is a finite-dimensional unitary representation, $o, h \in g$ , iO = dϕ(o), H = dϕ(h), and ρ is a density matrix. Then the following holds:

GradVar = \sum_{α} \frac{∥ H_{g_{α}} ∥_{K}^{2} ∥ O_{g_{α}} ∥_{F}^{2} ∥ ρ_{g_{α}} ∥_{F}^{2}}{d_{g_{α}}^{2}} .

Note that the center $c$ does not contribute to the variance.

As mentioned in the Introduction, the above theorem is the central result of the paper. It shows that under the assumption of a LASA we can get a precise mathematical expression for the gradient variance. Notably, this expression is in terms of quantities that are intimately linked with the Lie algebra and the representation and are well characterized for all simple algebras.

Interpretation of results

The three norms in the numerator of Equation (18) can be viewed as effectively measuring the support that each operator has on the simple ideal $d ϕ (g_{α})$ . Specifically, $∥ O_{g_{α}} ∥_{F}$ and $∥ ρ_{g_{α}} ∥_{F}$ being Frobenius norms can be interpreted as generalized measures of purity with respect to $d ϕ (g_{α})$ . This concept was actually first introduced in ref. ⁴³. A similar interpretation is also valid for the Killing norm $∥ H_{g_{α}} ∥_{K}$ , however, this time the relevant representation of $g_{α}$ is the adjoint representation, and so the norm is scaled by the ratio of the indices as in Equation (11).

If one is still uncomfortable with the Killing norm, we note that $∥ H_{g_{α}} ∥_{K}^{2} \leq 2 d_{g_{α}} ∥ H_{g_{α}} ∥_{F}^{2}$ (see the Supplementary Information), and so one gets the following upper bound:

GradVar \in O (\sum_{α} \frac{∥ H_{g_{α}} ∥_{F}^{2} ∥ O_{g_{α}} ∥_{F}^{2} ∥ ρ_{g_{α}} ∥_{F}^{2}}{d_{g_{α}}}),

which presents the result in terms of more familiar quantities, i.e., Frobenius norms. In addition, we now see that Conjecture 2.3 is explicitly proven (and indeed significantly generalized) for LASA.

From Equation (18) or (19), we infer that a BP can only occur whenever at least one of the terms in the expression leads to exponential decay. More specifically, the gradients will decay exponentially under any of these conditions: the state has exponentially small support over the Lie algebra; the state, the measurement operator, and the generator are mostly supported on a subalgebra, $g_{α}$ , the dimension of which is exponentially large; or the support of the state, measurement operator and generator are mutually incompatible on the subalgebras, in the sense that all terms vanish. The second condition amounts to the conjecture of ref. ²², while the last is a novel prediction of this work, which only occurs in the strict semisimple case.

Lastly, we conclude with some details on how one might use our results in practice to forecast gradient variance scaling without access to a quantum computer. The main goal is to find a basis for the DLA and compute its structure constants in $O (poly (d_{g}))$ time. Since the generators and observables will typically be linear combinations of Pauli strings, one can utilize symbolic computation to reason about the decomposition of $g$ into simple ideals. A basis for the DLA can be obtained by computing nested commutators symbolically and checking for linear independence as done in ref. ²². In summary, as input we are given the Hermitian generators used in the ansatz. We proceed by computing pairwise commutators, until we find no new linearly independent elements. If our current estimate for the basis has k elements, then we need to compute $(\binom{k}{2})$ pairwise commutators, and we need at most $d_{g}$ iterations. This leads to $O (d_{g}^{3})$ pairwise commutators in total.

Next, using the basis ${E_{k}}_{k = 1}^{d_{g}}$ for the DLA obtained from the process just discussed expressed as sums of Pauli strings, we compute the $d_{g} \times d_{g}$ matrices for each operator ${ad}_{{i E}_{k}}$ in the basis ${E_{k}}_{k = 1}^{d_{g}}$ . We denote these matrices by $\hat{{ad}_{{i E}_{k}}}$ , which contain the structure constants. The next step is to simultaneously block diagonalize the $\hat{{ad}_{E_{k}}}$ , which will reveal bases for the simple ideals. This can be done in $O (\log (d_{g}) poly (d_{g}))$ by diagonalizing $\hat{{ad}_{{i E}_{1}}}$ , and then finding invariant subspaces preserved by $\hat{{ad}_{{i E}_{2}}}$ . Then repeat this procedure for each smaller block that was found for $\hat{{ad}_{{i E}_{2}}}$ in the previous step, and so on. We can compute the $∥ O_{g_{α}} ∥_{F}^{2}$ and $∥ H_{g_{α}} ∥_{K}^{2}$ norms symbolically. If ${A_{k}}_{k = 1}^{d_{g_{α}}}$ is a basis for the ideal $g_{α}$ , which can be expressed in terms of sums of Pauli strings given our assumption, then the norm $∥ ρ_{g_{α}} ∥_{F}^{2} = \sum_{k = 1}^{d_{g_{α}}} Tr (A_{k}^{\otimes 2} ρ^{\otimes 2})$ can be computed classically for product input states.

However, we cannot yet claim that the overall computational complexity is polynomial in $d_{g}$ , as we typically express operators in the Pauli basis, and computing pairwise commutators can cause the support on the Pauli basis to grow, in the worst case, exponentially with the number of iterations. Simply put, there may be no way to express the basis elements compactly. This is the same challenge with the classical simulation technique $g$ -sim²⁷ and is currently unclear whether it can be overcome in general. Assuming that the growth of support of nested commutators in the Pauli basis does grow polynomially with the number of iterations, then we have a procedure with a runtime that is a (potentially large) polynomial in $d_{g}$ . If the DLA dimension is polynomial, then it is an overall polynomial-time process. This can at least be done at small scales to probe the scaling of the gradient variance.

The analysis so far assumed no a priori knowledge about the DLA. The situation radically improves when the DLA isomorphism class is known. Then exact variance calculation with our formula can become a relatively straightforward task, as we shall see in the next subsection.

Variance computation for quantum compound Ansatz

The quantum compound ansatz is a quantum representation on n qubits (2ⁿ-dimensional) of the Lie group SO(n) or SU(n)^30,44. Given a general g ∈ SU(n) (SO(n)), one can decompose it into a product of SU(2) (SO(2)) rotations on 2-dimensional subspaces, which are (generalized) Givens rotations:

U_{g} = \prod_{(i, j) \in E} U_{i j}^{Givens} (g),

and are implemented using the fermionic beam splitter (FBS) gate defined in ref. ³⁰.

The graph E can have various topologies, for example a pyramid or a staircase. The circuit preserves Hamming weight, and the representation splits into subspaces corresponding to the different Hamming weights. The analysis of the gradient variance for a more general class of Hamming weight-preserving unitaries appears in ref. ⁴⁵.

One can check that the appropriate representation for the generators of a SU(2) Givens rotation between qubit i and j is

h_{x}^{i j} = - \frac{i}{4} (σ_{x}^{i} \otimes σ_{x}^{j} + σ_{y}^{i} \otimes σ_{y}^{j}) \otimes σ_{z}^{\otimes ∣ i - j - 1 ∣}

= d ϕ (- \frac{i}{2} X^{(i j)})

h_{y}^{i j} = - \frac{i}{4} (σ_{y}^{i} \otimes σ_{x}^{j} - σ_{x}^{i} \otimes σ_{y}^{j}) \otimes σ_{z}^{\otimes ∣ i - j - 1 ∣}

= d ϕ (- \frac{i}{2} Y^{(i j)})

h_{z}^{i j} = - \frac{i}{4} (σ_{z}^{i} - σ_{z}^{j}) = d ϕ (- \frac{i}{2} Z^{(i j)}),

where X ^(ij), Y ^(ij), Z ^(ij) act as the Pauli operators σ_x, σ_y, σ_z on the 2 × 2 block formed by i and j, respectively, and are zero otherwise. They are elements of $su (n)$ , and ϕ is the direct sum of the alternating representations for k = 1, …, n, i.e.:

V = ⨁_{k = 1}^{n} \land^{k} C^{n} .

Note that the norm of each of these generators in $g$ is 1/2. Importantly, while the set of generators spans the representation of $g$ , since it is larger than the dimension of $g$ it is a not linearly independent set. Note the extra σ_z’s in the definition of h_x and h_y are reminiscent of the string of σ_z in the Jordan–Wigner encoding, only that here they are needed for the algebra to close. The SO case is generated by the $h_{y}^{i j}$ elements only.

To clarify why the ansatz is subspace uncontrollable, we can consider the Hamming weight n/2 subspace. On this subspace, the DLA is isomorphic to $su (n)$ , while the Lie algebra of the full space of unitary operators on this subspace is isomorphic to $su ((\binom{n}{n / 2}))$ , hence the compound ansatz cannot enact all unitary transformations.

Before proceeding we present a mixing time result to t-design for the quantum compound ansatz that is tighter than Theorem 2.6.

Theorem 2.10

(Rapid mixing for Compound Ansatz) Consider an n-qubit quantum compound ansatz that is a LASA constructed using the set of generators ${X^{(i j)}, Y^{(i j)}, \sum_{i = 1}^{j} Z^{(i j)}}$ with rotations angles chosen uniformly at random. Then, for t ≤ n/2, the ansatz is an ϵ-approximate t-design for the dynamical group SU(n) after $O (t n \log (1 / ϵ))$ layers.

Of course, for BPs t = 2 is the main interest. The proof follows simply from a generalization of Theorem 2.6 and is left to the Supplementary Information. Note that for the chosen set of generators some of the randomly chosen angles are not independent (i.e., the $\sum_{i = 1}^{j} Z^{(i j)}$ type generators).

The following three results utilize our theory of BPs for LASA to show that the quantum compound ansatz can be BP-free under uniform initialization.

Theorem 2.11

For a quantum compound ansatz that is also LASA, if the initial state is a computational basis state, then the following holds:

GradVar \in Ω (\frac{1}{n^{3}}) .

The conclusion is that SU compound layers with Lie algebra-supported measurements do not have BPs for any fixed Hamming weight computational basis state. Note that computational basis states of the same Hamming weight are in an irreducible subspace of the tensor product representation (see the Supplementary Information).

Next, we consider the uniform superposition state $∣ψ⟩ = {∣+⟩}^{\otimes n}$ and show that the quantum compound ansatz is still BP-free. In addition, in this case, the variance decays exactly with the DLA dimension n² − 1.

Theorem 2.12

For a quantum compound ansatz that is also LASA, if the initial state is a uniform superposition of all computational basis states, then the following holds:

GradVar \in Θ (\frac{1}{n^{2}}) .

Thus, we also have no BP with the initial state being the uniform superposition. We numerically verified the predictions for the various initial states as shown in Fig. 2.

Fig. 2 — Dots are numerical results while dotted lines are analytical predictions using the equations in the text. Showing results for computational basis input states of Hamming weight 1 and n/2 and the uniform superposition state ${∣+⟩}^{\otimes n}$ , for n number of qubits ranging from 2 to 18 in steps of 2. The measurement operator is $- i h_{12}^{z} = (σ_{1}^{z} - σ_{2}^{z}) / 4$ . Accounting for the randomness of initialization, there is good agreement of numerical results with the predictions. The error bars are too small to plot. Additional information on the numerics is in the Supplementary Information.

Finally, we see how the result can be extended to cover single-qubit measurements.

Corollary 2.12.1

For a quantum compound ansatz with an observable that is composed of single-qubit measurements, and if the initial state is a computational basis state or the uniform superposition of all computational basis states, then the following holds:

GradVar \in Ω (\frac{1}{poly (n)}) .

We verify these predictions in Fig. 3.

This answers an open question proposed in ref. ³¹. As a final note, even though $σ_{i}^{z}$ does not lie in the DLA, single-qubit expectations of observables with respect to the compound ansatz starting from a product state are still known to be classically simulatable⁴⁶.

Lastly, we present another setting in which the observable does not lie in the DLA, but this time, the quantum compound ansatz has a BP.

Theorem 2.13

For the quantum compound ansatz if the initial state is a computational basis state with Hamming weight $\frac{n}{2}$ and the observable is a rank-one projector onto another computational basis state in this space, then

GradVar \in O ({(\begin{matrix} n \\ n / 2 \end{matrix})}^{- 1}) .

We verify the scaling in Fig. 4.

Intuitively, the above decay comes from the fact our choice of observable and initial state are rank-one projectors, and thus, the overlap of traceless parts of both operators will spread across an exponentially large subset of $su$ . Theorem 2.13 is interesting because the compound ansatz is not very expressive and the depth of the circuit exceeds the shallow regime of $O (\log (n))$ ⁴⁷. We note that the cost function we choose is still global.

The details of how the numerical results were obtained are described in the Supplementary Information.

Comparison with previous approaches

As mentioned in the Introduction, previous approaches have taken a state-first or Schrödinger picture viewpoint. Specifically, under the action of G, the quantum state space V will decompose into invariant subspaces:

V = ⨁_{κ} V_{κ},

each of which is acted upon by the subrepresentation ϕ_κ(G). This decomposition is in line with the symmetries that the ansatz obeys, i.e., its commutant³⁵. If the initial state ρ ∈ V_κ, then since G preserves this space, the variance calculation is restricted to integrating over ϕ_κ(G). If the restriction of the DLA $d ϕ_{κ} (g)$ to the invariant subspace is isomorphic to $su (\dim V_{κ})$ , then one says that PQC is subspace controllable on V_κ, otherwise, it is subspace uncontrollable. The calculation is possible in the subspace-controllable setting via the Schur-Weyl duality²², but the subspace uncontrollable setting poses significant obstacles to the calculation of the second moment (Equation (14)) using this approach.

In our setting we are instead using the Heisenberg picture and, assuming LASA, considering the action of $d ϕ (g)$ on itself via conjugation, so $V = d ϕ (g)$ in this case and dϕ is the adjoint representation. Notice that if the DLA is reductive (Equation (16)) and ϕ is faithful (injective), the decomposition of V respects the decomposition into simple ideals:

d ϕ (g) = d ϕ ((⨁_{κ} g_{κ} () = ⨁_{κ} d ϕ (g_{κ}) = ⨁_{κ} d ϕ {(g)}_{κ} .

Thus, the Lie algebra being reductive implies that the adjoint representation splits into irreducible invariant subspaces, which are precisely the simple ideals $d ϕ (g_{κ})$ . As detailed in Methods, this is sufficient to calculate the second moment for any compact Lie group.

So in this setting, we always know the invariant subspaces and the representation acting on them, namely the $d ϕ (g_{κ})$ and the corresponding adjoint representation. This is a significant simplification from the Schrödinger picture approach and enables us to completely circumvent the obstacles posed by the subspace uncontrollable setting. Notice finally that now the invariant subspaces $d ϕ (g_{κ})$ reflect the symmetries that are preserved by the evolution of observable instead of the state, so while related this is a different concept of PQC symmetry than the one prior work had explored.

Lastly, we would like to emphasize that DLA does not always split into a direct sum over the decomposition of V into V_k for an arbitrary unitary representation. However, this does hold if $d ϕ {(g)}_{κ}$ for subspace V_κ is simple, like it is for the adjoint and in ref. ²³. More specifically, the condition implies that $d ϕ {(g)}_{κ}$ must then be $d ϕ (g_{α})$ for some simple ideal $g_{α}$ .

Discussion

In this work, we present a general framework for diagnosing the BP phenomenon in Lie algebra supported ansätze, which includes popular PQCs, such as HVA, QAOA, and various equivariant QNNs. Our main contribution is a method that explains the previously mysterious connection between the dimension of the DLA and the rate at which gradients decay. This method has enabled us to analyze the gradient variance for subspace uncontrollable circuits, such as the quantum compound ansätze, which was not previously possible with existing techniques from the literature.

We note that the kinds of circuits where the simulatability results of ref. ²⁷ apply are exactly LASAs. In fact, many of the techniques employed here are similar. As the aforementioned paper links the dimension of the DLA to the performance of the classical simulation of expectations via their algorithm $g$ -sim, we see that at least for LASAs there is a connection between the absence of vanishing gradients and simulatability, in the sense that a LASA with polynomial DLA can avoid BPs but is classically simulatable. Future work may look at the vanishing gradients in other symmetric settings like those of refs. ^24–26, and at elucidating this connection more generally. We also note that our results could be applied to the DLAs that have been classified by ref. ⁴².

Regarding general VQAs, when the observable has support outside of the DLA, we show in the Supplementary Information that the same techniques used in the LASA setting can be used to obtain the gradient variance expression for general ansatz. Unfortunately, it can be challenging to determine gradient variance scaling from these expressions in general. Characterizing the gradient variance in this setting would potentially allow for constructing ansätze that both do not have BPs and do not have classically simulated expectations. Existing literature has already shown that when the observable lies in the DLA and the DLA has polynomially growing dimension, then the computation of expectation values can be classically simulated. Potentially, the gradient variance can be shown to still scale inversely with the DLA dimension when the observable has only some small support outside of the DLA, as we have shown for the quantum compound ansatz (Corollary 2.12.1).

Lastly, BPs only correspond to one of two issues that plague VQAs. As mentioned earlier, like BPs, the convergence of VQAs has also only been theoretically characterized in the subspace-controllable setting¹⁴. Potentially, the framework we have developed can be applied to understanding the projected gradient dynamics that occur in the uncontrollable setting.

Note on ref. ⁴⁸: During the writing of the manuscript, we became aware through a comment in ref. ⁴² that Michael Ragone et al. have independently obtained a proof of an extension of the conjecture in ref. ²². This was later released in ref. ⁴⁸. We encourage the reader to review both papers for a richer picture of the solution, however we summarize here the most important differences between our works. The main one is that the work of Ragone et al. focuses on cost function concentration as opposed to concentration of the partial derivatives. The authors mention, by citing ref. ⁴⁹, that loss function concentration implies concentration of the partial derivatives, and thus provide bounds. However, in our case, we obtain exact expressions for the variance of the partial derivatives, thus revealing the connection between the gradient variance scaling and the Killing norm of the generators. In addition, we include explicit formulae for the gradient variance for the quantum compound ansatz in commonly used settings, which leads to the novel prediction that it can avoid BPs under Haar initialization. Lastly, we include a discussion on the application of our techniques to observables that lie outside of the DLA. The work by Ragone et al., however, does include a broader discussion that links BPs in symmetric ansätze to other known causes of BPs, including cost function-induced¹⁹ and noise-induced²⁰, and thus places the result into a wider context.

Methods

In this section, we formally derive the connection between the DLA dimension and the gradient variance, leading to our theory of BPs. Specifically, we present the proofs of the majority of the theorems shown in the Results section, the rest are left to the Supplementary Information. The main tools that we utilize are the concepts of the adjoint representation and Schur orthogonality.

The adjoint representation connection

We start by providing some explanation as to why the connection between the DLA dimension and BPs that agrees with existing numerical evidence is not obvious. It will be the adjoint representation that makes the relationship clear and allow for exact computation of the gradient variance that agrees with existing numerics.

As in earlier parts of the text, the dynamical group $G$ associated to a periodic ansatz is a unitary representation of some other Lie group G. Thus, the representation ϕ: G → SU(2ⁿ) corresponds to G acting on the n-qubit Hilbert space V and $ϕ (G) = G$ . Let $M (C, 2^{2 n})$ denote the set of 2²ⁿ × 2²ⁿ complex matrices.

Before proceeding we make a small note on the compactness of the dynamical group. While the dynamical group $G$ is obviously connected, it may not be compact (due to lack of closure). An example is the irrational flow on a torus that occurs when the generators ${\tilde{H}}_{i}$ have at least two eigenvalues whose ratio is irrational. The action of these generators will lead to non-periodic orbits. Notice that such non-periodic ansätze can occur in principle, for example in QAOA on graphs with random weights. However, since any Lie subalgebra of $su (2^{n})$ must be the direct sum of compact simple Lie subalgebras and its center^42,50, ignoring the center also leads to a compact, connected dynamical subgroup. Thus if $G$ is not closed, this will be the compact dynamical subgroup we consider. Note that it is harmless to ignore the center, since the component of the observable in the center of $g$ does not evolve (in a Heisenberg sense) anyways.

The variance of the gradient, under Haar initialization, relies on the second-moment operator:

T : A \mapsto \int_{G} (U_{g} \otimes U_{g}) A (U_{g}^{†} \otimes U_{g}^{†}) d g,

which orthogonally projects onto the set of commuting operators (i.e., commutant) of {U_g ⊗ U_g: ∀_g ∈ G}. Commutation implies that $\forall A \in M (C, 2^{2 n}), T (A)$ must respect the decomposition of V^⊗2 into irreducible components (invariant subspaces). If V^⊗2 has the following decomposition into irreducible components (not grouping by multiplicity)

V^{\otimes 2} = ⨁_{λ} V_{λ},

then

\int_{G} (U_{g} \otimes U_{g}) A (U_{g}^{†} \otimes U_{g}^{†}) d g = \sum_{λ} \frac{Tr [A P_{λ}]}{\dim V_{λ}} P_{λ},

for orthogonal projectors P_λ onto V_λ. This projection can also be expressed in terms of the well-known Weingarten function^51,52. Notice that the Lie algebra appears to play no role in this discussion. In addition, the inverse scaling with the dimension of each V_λ is apparent. Furthermore, while a general theory of such integrals exists⁵³, they are quite challenging to tackle in practice. Most results in quantum information restrict to the case where G = SU(2ⁿ), where the commutant is easy to characterize. Specifically, this leads to the well-known result that approximate 2-designs for SU(2ⁿ) have BPs^18,22.

Fortunately, the integrals appearing in the theory of VQAs turn out to have substantial simplifications, which furnishes the connection to the dimension of the DLA in certain settings. Our results shed much-needed light on this apparently unintuitive phenomenon observed in practice. The first key insight is that A is always a tensor product of two operators, i.e., if O is the observable in the quantum circuit, then we get second-moment integrals with A = iO ⊗ iO, that is,

\int_{G} (U_{g} i O U_{g}^{†}) \otimes (U_{g} i O U_{g}^{†}) d g

= \int_{G} {Ad}_{g} (i O) \otimes {Ad}_{g} (i O) d g,

where the relation to the well-known adjoint representation, of G, i.e., ${Ad}_{g} (i O) = U_{g} i O U_{g}^{†}$ , is apparent when iO lies in $g$ . This simple observation is critical in enabling concise expressions for the variance of the gradient, revealing the inverse dependence on the dimension of the DLA. Specifically, given that the dimension of the adjoint representation is $d_{g}$ the reason for the scaling becomes more plausible.

Note that to connect back to (35), this can also be viewed as a projection of the subspace

S : = {span}_{C} {i O \otimes i O : i O \in d ϕ (g)} \subset M (C, 2^{2 n})

onto the commutant via an operator called the Casimir.

The (split quadratic) Casimir operator, K, for representation ϕ is defined as:

K = I_{ϕ}^{- 1} \sum_{i} E_{i} \otimes E_{i},

where {e_i} is an orthonormal basis under the standard norm for $g$ and E_i = dϕ(e_i). We can also use the Casimir to define an orthogonal projector, $P_{g}$ , from the space of skew-Hermitian operators on V, i.e., $u (V)$ , onto the subspace $d ϕ (g)$ , which is useful when we are dealing with objects not completely supported on the Lie algebra:

X_{g} : = P_{g} X = - {Tr}_{1} ((X \otimes 1) K)

= - I_{ϕ}^{- 1} \sum_{i} Tr (X E_{i}) E_{i},

∥ X_{g} ∥_{F}^{2} = - Tr ((X \otimes X) K) = - I_{ϕ}^{- 1} \sum_{i} {Tr}^{2} (X E_{i})

where $X \in u (V)$ and ${Tr}_{1}$ is the partial trace over the first subspace. One can check that as expected $P_{g} d ϕ (a) = d ϕ (a)$ .

Proof of theorem 2.8

The following Lemma is fundamental to our main theorem, it may also be of independent interest. The proof can be found in the Supplementary Information.

Lemma 4.1

Let G be a compact simple Lie group with Lie algebra $g$ . Suppose V is a finite-dimensional inner product space, $ϕ : G \to U (V)$ is a unitary representation of G, and U_g = ϕ(g). In addition, $a \in g$ , A = dϕ(a). Then the following holds:

\int_{G} {(U_{g} A U_{g}^{†})}^{\otimes 2} d g = \frac{∥ A ∥_{F}^{2}}{d_{g}} K .

From Lemma 4.1, it can be seen that the commutant is the one-dimensional subspace spanned by the Casimir operator, i.e.

T (R) = \frac{Tr (R^{†} K)}{d_{g}} K \forall R \in S .

We are also going to frequently use the following identity. Let $a \in g$ and A: = dϕ(a). Also let E_i: = dϕ(e_i) be a basis for the Lie algebra orthonormal under the standard norm. Then

∥ A ∥_{F}^{2} = - I_{ϕ}^{- 1} \sum_{i} {Tr}^{2} (A E_{i}),

which is important as when working with a quantum circuit one often has access to the representation basis {E_i} but not directly to {e_i}, so it is a convenient shortcut to calculate $∥ a ∥_{g}^{2}$ .

Proof of theorem 2.8

As was shown in the Results section, we can assume $GradVar = E_{g^{+}, g^{-} ~ μ^{\otimes 2}} [{(\partial {⟨ O ⟩}_{ρ})}^{2}]$ . Let us write the integral for the second moment in full, and rearrange terms appropriately:

E_{g^{+}, g^{-} ~ μ^{\otimes 2}} [{(\partial {⟨ O ⟩}_{ρ})}^{2}]

= \iint_{G} {(Tr (U_{g^{-}} i ρ U_{g^{-}}^{†} [H, U_{g^{+}} i O U_{g^{+}}^{†}]))}^{2} d g^{+} d g^{-}

= \iint_{G} Tr \{{(i ρ)}^{\otimes 2} U_{g^{-}}^{\otimes 2} ([H, U_{g^{+}} i O U_{g^{+}}^{†}])) ((\otimes [H, U_{g^{+}} i O U_{g^{+}}^{†}]) U_{g^{-}}^{† \otimes 2}\} d g^{+} d g^{-} .

Suppose $X_{+} : = \int_{G} {(U_{g^{+}} i O U_{g^{+}}^{†})}^{\otimes 2} d g^{+}$ . Let us ignore the trace and ρ, and expand out the commutators:

\begin{matrix} \iint_{G} U_{g^{-}}^{\otimes 2} (H U_{g^{+}} i O U_{g^{+}}^{†} - U_{g^{+}} i O U_{g^{+}}^{†} H) \\ \otimes (H U_{g^{+}} i O U_{g^{+}}^{†} - U_{g^{+}} i O U_{g^{+}}^{†} H) U_{g^{-}}^{† \otimes 2} d g^{+} d g^{-} \end{matrix}

= \int_{G} U_{g}^{- \otimes 2} H^{\otimes 2} X_{+} U_{g^{-}}^{† \otimes 2} d g^{-}

+ \int_{G} U_{g^{-}}^{\otimes 2} X_{+} H^{\otimes 2} U_{g^{-}}^{† \otimes 2} d g^{-}

- \int_{G} U_{g^{-}}^{\otimes 2} (H \otimes 1) X_{+} (1 \otimes H) U_{g^{-}}^{† \otimes 2} d g^{-}

- \int_{G} U_{g}^{- \otimes 2} (1 \otimes H) X_{+} (H \otimes 1) U_{g^{-}}^{† \otimes 2} d g^{-} .

We end up with four similar terms. Starting with the common inner integral, since G is compact, we can apply Lemma 4.1 and write

X_{+} = \int_{G} {(U_{g^{+}} i O U_{g^{+}}^{†})}^{\otimes 2} d g^{+} = \frac{∥ O ∥_{F}^{2}}{d_{g}} K .

We can plug this expression back into the earlier expression without the trace and ρ, and rearranging terms and using $K : = I_{ϕ}^{- 1} \sum_{i} E_{i} \otimes E_{i}$ gives:

\frac{∥ O ∥_{F}^{2}}{I_{ϕ} d_{g}} \sum_{k = 1}^{d_{g}} \int_{G} U_{g} [H, E_{k}] U_{g}^{†} \otimes U_{g} [H, E_{k}] U_{g}^{†} d g .

Now applying the Lemma again, noting that H = ∑_qh_qE_q, we have:

\frac{∥ O ∥_{F}^{2}}{I_{ϕ} d_{g}^{2}} \sum_{j, k = 1}^{d_{g}} ∥ [H, E_{k}] ∥_{F}^{2} K

= \frac{∥ O ∥_{F}^{2}}{I_{ϕ} d_{g}^{2}} \sum_{j, k = 1}^{d_{g}} \frac{Tr ([H, E_{k}] E_{j}) Tr ([H, E_{k}] E_{j})}{I_{ϕ}} K

= \frac{∥ O ∥_{F}^{2}}{d_{g}^{2}} \sum_{q, r, j, k = 1}^{d_{g}} h_{q} h_{r} \frac{Tr ([E_{q}, E_{k}] E_{j}) Tr ([E_{r}, E_{k}] E_{j})}{I_{ϕ}^{2}} K

= \frac{∥ O ∥_{F}^{2}}{d_{g}^{2}} \sum_{q, r, j, k = 1}^{d_{g}} h_{q} h_{r} f_{q k}^{j} f_{r k}^{j} K

= \frac{∥ O ∥_{F}^{2}}{d_{g}^{2}} \sum_{q, r = 1}^{d_{g}} h_{q} h_{r} (- \sum_{j, k = 1}^{d_{g}} f_{q k}^{j} f_{r j}^{k}) K

= \frac{∥ O ∥_{F}^{2}}{d_{g}^{2}} \sum_{q, r = 1}^{d_{g}} h_{q} h_{r} (- g_{q r}) K

= \frac{∥ O ∥_{F}^{2} ∥ H ∥_{K}^{2}}{d_{g}^{2}} K,

where we have used anti-symmetry of the commutator braket to reveal that the inner sum is the Killing form (since $g$ is a compact simple Lie algebra, the negative of the Killing form is a valid inner product). Note that $f_{q k}^{j} = Tr ([E_{q}, E_{k}] E_{j})$ are the structure constants.

Now, we can reintroduce the trace and ρ to get:

E_{g^{+}, g^{-} ~ μ^{\otimes 2}} [{(\partial {⟨ O ⟩}_{ρ})}^{2}]

= \frac{∥ H ∥_{K}^{2} ∥ O ∥_{F}^{2}}{d_{g}^{2}} Tr ({(i ρ)}^{\otimes 2} K)

= \frac{∥ H ∥_{K}^{2} ∥ O ∥_{F}^{2} ∥ ρ_{g} ∥_{F}^{2}}{d_{g}^{2}} .

Proof of theorem 2.9

The following is a generalization of Lemma 4.1 to outside the simple group setting. The proof can be found in the Supplementary Information.

Lemma 4.2

Let G be a compact and connected Lie group with Lie algebra $g$ . Suppose V is a finite-dimensional inner product space, $ϕ : G \to U (V)$ is a unitary representation of G, and U_g = ϕ(g). In addition, $a \in g$ , A = dϕ(a). Then the following holds:

\int_{G} {(U_{g} A U_{g}^{†})}^{\otimes 2} d g = \sum_{α} \frac{∥ A_{g_{α}} ∥_{F}^{2}}{d_{g_{α}}} K_{g_{α}} + A_{c}^{\otimes 2},

where $A_{g_{α}}$ is the image of the component of a in $g_{α}$ under dϕ. Likewise, $K_{g_{α}}$ is the Casimir in the subalgebra $g_{α}$ .

The above result implies that we expect contributions to the variance from the various subalgebras. Indeed, the final expression for the variance is remarkably simple, since all the cross terms between different subalgebras vanish, and the abelian subalgebras do not contribute.

Proof of theorem 2.9

The proof largely follows the strategy of that for simple groups. Define the shorthand $i O_{g_{α}} : = P_{g_{α}} i O$ and $i O_{c} : = P_{c} i O$ . Like before, we expand the commutator but this time use Lemma 4.2:

\int_{G} {(U_{g^{+}} i O U_{g^{+}}^{†})}^{\otimes 2} d g^{+} = \sum_{α} \frac{∥ O_{g_{α}} ∥_{F}^{2}}{d_{g_{α}}} K_{g_{α}} + O_{c}^{\otimes 2} .

Now, after applying the commutator and taking the integral over $U_{g^{-}}$ , we find the result is still a summation over α only. This is because, since the subalgebras are ideals, if $E_{k} \in d ϕ (g_{α})$ then $[H, E_{k}] \in d ϕ (g_{α})$ , and therefore $∥ P_{g_{β}} [H, E_{k}] ∥_{F} = 0$ if β ≠ α. Thus the cross terms vanish. The contribution from the center also vanishes upon taking the commutator. Thus the result follows.

Proof of theorem 2.11

Using the identities from the Representation theoretic notation subsection of Results we can get forms of the theorems that are practically useful. For example, in the simple group case,

GradVar = \frac{I_{Ad} ∥ o ∥_{g}^{2} ∥ h ∥_{g}^{2}}{d_{g}^{2}} \sum_{i} {Tr}^{2} (i ρ E_{i}),

where E_i = dϕ(e_i) for orthonormal basis {e_i} for $g$ . This turns out to be the most useful form of the result for the examples below because we will have explicit knowledge of the representation ϕ. In addition, the representation index, I_ϕ, drops out.

Proof of theorem 2.11

For $su (n)$ , $d_{g} = n^{2} - 1$ and the Dynkin index of the adjoint representation is I_Ad = 2n. Now we work out the state’s projected norm. Choose ρ to be a computational basis state, where it can be shown that ρ ⊗ ρ lies in an irreducible subrepresentation of the tensor product representation ϕ ⊗ ϕ (see Supplementary Information). Then we only need to focus on the simultaneously diagonal elements of the Lie algebra, that is, the Cartan subalgebra $h$ . To calculate the Casimir eigenvalue we need to find an orthogonal basis $H$ for $h$ , which cannot be ${h_{z}^{i j}}_{i \neq j}$ since the elements are not linearly independent.

We can construct a suitable basis for $h$ using the formula

H = \frac{i}{2} ⋃_{m = 1}^{n - 1} \frac{1}{\sqrt{m (m + 1)}} \{m σ_{m + 1}^{z} - \sum_{i = 1}^{m} σ_{j}^{z}\}

\begin{matrix} = \frac{i}{4} \{\sqrt{2} (σ_{2}^{z} - σ_{1}^{z}), \frac{\sqrt{2}}{\sqrt{3}} (2 σ_{3}^{z} - σ_{2}^{z} - σ_{1}^{z}),) \\ (\frac{1}{\sqrt{3}} (3 σ_{4}^{z} - σ_{3}^{z} - σ_{2}^{z} - σ_{1}^{z}), . . .\} \end{matrix}

even though this is expressed more cleanly with Pauli zs, each element can be obtained as a linear combination of the ${h_{z}^{i j}}$ generators. One can check that the elements are all orthogonal and the norm of their pullback on $g$ is 1, and the resulting subalgebra has the correct dimension: $\dim h = rank su (n) = n - 1$ . With this, one can explicitly calculate the diagonal part of I_ϕK for any n,

I_{ϕ} diag (K) = \sum_{H_{i} \in H} H_{i} \otimes H_{i}

however the calculation is unwieldy. Fortunately, we can directly infer the final form from symmetry arguments, since by inspection: diag(K) is composed of sums of tensor products of two Pauli zs, it is symmetric around the tensor product, and furthermore since SWAP_ij ⊗ SWAP_ij ∈ ϕ(G) ⊗ ϕ(G) it must be invariant upon any simultaneous permutation of the qubit indices on the subspaces. Thus,

I_{ϕ} diag (K) = A \sum_{i = 1}^{n} σ_{i}^{z} \otimes σ_{i}^{z} + B \sum_{i \neq j} σ_{i}^{z} \otimes σ_{j}^{z} .

To find the value of A, evaluate diag(K) on the state $∣Ψ⟩ = {∣+ . . . + 0⟩}^{\otimes 2}$ using Eqs. (71) and (72):

I_{ϕ} ⟨ Ψ ∣ diag (K) ∣ Ψ ⟩ = - \frac{1}{4 n (n - 1)} {(n - 1)}^{2} ⟨ Ψ ∣ σ_{n}^{z} \otimes σ_{n}^{z} ∣ Ψ ⟩

= A ⟨ Ψ ∣ σ_{n}^{z} \otimes σ_{n}^{z} ∣ Ψ ⟩ \Rightarrow A = - \frac{n - 1}{4 n},

and for B, on $∣Ψ^{'}⟩ = ∣+ . . . + 0⟩ \otimes ∣+ . . . + 0 +⟩$ :

I_{ϕ} ⟨ Ψ^{'} ∣ diag (K) ∣ Ψ^{'} ⟩

= \frac{1}{4 n (n - 1)} (n - 1) ⟨ Ψ^{'} ∣ (σ_{n}^{z} \otimes σ_{n - 1}^{z}) {∣ Ψ}^{'} ⟩

= B ⟨ Ψ^{'} ∣ σ_{n}^{z} \otimes σ_{n - 1}^{z} ∣ Ψ^{'} ⟩ \Rightarrow B = \frac{1}{4 n} .

Now we use this to evaluate the expectation value of K on a computational basis state of Hamming weight k. The first summation in Eq. (72) will be constant and equal to n, while the second summation will be equal to the number of distinct bits of equal value minus those of different value, k(k − 1) + (n − k)(n − k − 1) − 2k(n − k) = (n−2k)² − n. So overall

I_{ϕ} ∥ ρ_{g} ∥_{F}^{2} = \sum_{H_{i} \in H} {Tr}^{2} (i ρ H_{i})

= \frac{n - 1}{4} - \frac{{(n - 2 k)}^{2} - n}{4 n} = \frac{k (n - k)}{n} .

Choosing $O = i h_{z}^{12}$ and H any generator, $∥ o ∥_{g}^{2} = 1 / 2 = ∥ h ∥_{g}^{2}$ , and the final result is

GradVar = \frac{2 n {(1 / 2)}^{2}}{{(n^{2} - 1)}^{2}} \frac{k (n - k)}{n}

= \frac{k (n - k)}{2 {(n^{2} - 1)}^{2}} \in Ω (\frac{1}{n^{3}}) .

Proof of theorem 2.12

For the uniform superposition of computational basis states, $∣ψ⟩ = {∣+⟩}^{\otimes n}$ , then $\forall i, j, ⟨ ψ ∣ h_{i j}^{y} ∣ ψ ⟩ = ⟨ ψ ∣ h_{i j}^{z} ∣ ψ ⟩ = 0$ . The only nonzero terms involve the Pauli-x type generators. We can form the corresponding orthogonal generators normalized in $g$ by $H_{i j}^{x} = \sqrt{2} h_{i j}^{x}$ . However, even though there are $(\binom{n}{2})$ , only the n − 1 with j = i + 1 do not annihilate on $∣ψ⟩$ since the others have σ_z’s in their definition. For these generators, $⟨ ψ ∣ H_{i j}^{x} ∣ ψ ⟩ = - \frac{i}{2 \sqrt{2}}$ , giving

I_{ϕ} ∥ P_{g} ρ ∥_{F}^{2} = - \sum_{i = 1}^{n - 1} ∣ ⟨ ψ ∣ H_{i (i + 1)}^{x} ∣ ψ ⟩ ∣^{2} = \frac{1}{8} (n - 1),

and so

GradVar = \frac{2 n {(1 / 2)}^{2}}{{(n^{2} - 1)}^{2}} \frac{(n - 1)}{8} = \frac{n (n - 1)}{16 {(n^{2} - 1)}^{2}} \in Θ (\frac{1}{n^{2}}) .

Proof of corollary 2.12.1

We expand the variance term for the computational basis state case:

\frac{2 k (n - k)}{{(n^{2} - 1)}^{2}} = {Var}_{(g_{+}, g_{-}) ~ μ^{\otimes 2}} [\partial ⟨ σ_{i}^{z} - σ_{j}^{z} ⟩]

= {Var}_{(g_{+}, g_{-}) ~ μ^{\otimes 2}} [\partial ⟨ σ_{i}^{z} ⟩] + {Var}_{(g_{+}, g_{-}) ~ μ^{\otimes 2}} [\partial ⟨ σ_{j}^{z} ⟩]

- 2 {Cov}_{(g_{+}, g_{-}) ~ μ^{\otimes 2}} [\partial ⟨ σ_{i}^{z} ⟩, \partial ⟨ σ_{j}^{z} ⟩] .

Note since a permutation swapping qubit i with j is a valid compound SU matrix, we have that $\partial ⟨ σ_{i}^{z} ⟩$ and $\partial ⟨ σ_{j}^{z} ⟩$ are identically distributed. Thus,

\begin{matrix} \frac{2 k (n - k)}{{(n^{2} - 1)}^{2}} = 2 {Var}_{(g_{+}, g_{-}) ~ μ^{\otimes 2}} [\partial ⟨ σ_{i}^{z} ⟩] \\ - 2 {Cov}_{(g_{+}, g_{-}) ~ μ^{\otimes 2}} [\partial ⟨ σ_{i}^{z} ⟩, \partial ⟨ σ_{j}^{z} ⟩] . \end{matrix}

Due to the above equality and Cauchy–Schwarz, i.e., ${Var}_{(g_{+}, g_{-}) ~ μ^{\otimes 2}} [\partial ⟨ σ_{i}^{z} ⟩] \geq ∣ {Cov}_{(g_{+}, g_{-}) ~ μ^{\otimes 2}} [\partial ⟨ σ_{i}^{z} ⟩, \partial ⟨ σ_{j}^{z} ⟩] ∣$ (recall the variances are equal), we can conclude that ${Var}_{(g_{+}, g_{-}) ~ μ^{\otimes 2}} [\partial ⟨ σ_{i}^{z} ⟩]$ must only be polynomially vanishing in n, which implies no BP for any k and any single-qubit σ_z measurement. A similar result can be shown to hold for the uniform superposition state.

Supplementary information

Supplementary Information^{(662.1KB, pdf)}

Peer Review File^{(763.3KB, pdf)}

Acknowledgements

We thank Iordanis Kerenidis for early discussions on BPs in quantum compound ansätze, and Aram Harrow for helpful discussions and feedback on the manuscript. We thank Marco Cerezo and Martin Larocca for discussions on the basics of equivariant QNNs and the role of the DLA. We also thank the members of Global Technology Applied Research at JPMorgan Chase & Co. for their comments and feedback throughout the project. This paper was prepared for informational purposes by the Global Technology Applied Research Center of JPMorgan Chase & Co. This paper is not a product of the Research Department of JPMorgan Chase & Co. or its affiliates. Neither JPMorgan Chase & Co. nor any of its affiliates make any explicit or implied representation or warranty, and none of them accept any liability in connection with this paper, including, without limitation, with respect to the completeness, accuracy, or reliability of the information contained herein and the potential legal, compliance, tax, or accounting effects thereof. This document is not intended as investment research or investment advice, or as a recommendation, offer, or solicitation for the purchase or sale of any security, financial instrument, financial product, or service, or to be used in any way for evaluating the merits of participating in any transaction.

Author contributions

D.H. and S.C. conceived the research question, and E.F. wrote the first proof of Conjecture 2.3, to which D.H. and S.C. made significant improvements. E.F. wrote the proof to Theorem 2.11, 2.12, and Corollary 2.12.1. D.H. wrote the proof for Theorem 2.6, 2.10, 2.13, and the majority of the theory in the Supplementary Information, with contributions from S.C. N.K., R.Y., J.H., S.H.S., and M.P. contributed to the technical discussions and had a role in writing and proof-reading the manuscript.

Peer review

Peer review information

Nature Communications thanks Bobak Kiani, Jonas Landman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available

Data availability

The gradient variance simulation data generated in this study have been deposited in the Zenodo database under accession code 10.5281/zenodo.10720106.

Code availability

The code used to generate the gradient variance simulation data has been deposited in the Zenodo database under accession code 10.5281/zenodo.10720106.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41467-024-49910-w.

References

1.Cerezo, M. et al. Variational quantum algorithms. Nat. Rev. Phys.3, 625–644 (2021). [Google Scholar]
2.Farhi, E., Goldstone, J. & Gutmann, S. A quantum approximate optimization algorithm, arXivhttps://arxiv.org/abs/1411.4028 (2014).
3.Peruzzo, A. et al. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun.5, 4213 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Liu, X. et al. Layer VQE: A variational approach for combinatorial optimization on noisy quantum computers. IEEE Trans. Quantum Eng.3, 1–20 (2022). [Google Scholar]
5.Niroula, P. et al. Constrained quantum optimization for extractive summarization on a trapped-ion quantum computer. Sci. Rep.12, 10.1038/s41598-022-20853-w (2022), [DOI] [PMC free article] [PubMed]
6.Herman, D. et al. Constrained optimization via quantum Zeno dynamics. Commun. Phys.6, 219 (2023).
7.Shaydulin, R. et al. Evidence of scaling advantage for the quantum approximate optimization algorithm on a classically intractable problem. arXivhttps://arxiv.org/abs/2308.02342 (2023). [DOI] [PMC free article] [PubMed]
8.Mitarai, K., Negoro, M., Kitagawa, M. & Fujii, K. Quantum circuit learning. Phys. Rev. A98, 10.1103/physreva.98.032309. (2018).
9.Farhi, E. & Neven, H. Classification with quantum neural networks on near term processors. http://arxiv.org/abs/1802.06002 (2018).
10.Havlíček, Vojtěch et al. Supervised learning with quantum-enhanced feature spaces. Nature567, 209–212 (2019). [DOI] [PubMed] [Google Scholar]
11.Larocca, Martínet al. Group-invariant quantum machine learning. PRX Quantum3, 10.1103/prxquantum.3.030341 (2022),
12.Herman, D. et al. Expressivity of variational quantum machine learning on the boolean cube. IEEE Trans. Quantum Eng.4, 1–18 (2023). [Google Scholar]
13.You, X. & Wu, X. Exponentially many local minima in quantum neural networks. In: International Conference on Machine Learning. pp. 12144–12155 (2021).
14.You, X., Chakrabarti, S., and Wu, X. A convergence theory for over-parameterized variational quantum eigensolvers. http://arxiv.org/abs/2205.12481 (2022).
15.E.R. Anschuetz. Critical points in quantum generative models. In: International conference on learning representations (2021).
16.Anschuetz, E. R. & Kiani, B. T. Quantum variational algorithms are swamped with traps. Nat. Commun.13, 7760 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.You, X., Chakrabarti, S., Chen, B., and Wu, X. Analyzing convergence in quantum neural networks: deviations from neural tangent kernels. In: Proceedings of the 40th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 202, edited by Krause, Andreas, Brunskill, Emma, Cho, Kyunghyun, Engelhardt, Barbara, Sabato, Sivan, and Scarlett, Jonathan (PMLR) pp. 40199–40224 https://proceedings.mlr.press/v202/you23a.html (2023).
18.McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren plateaus in quantum neural network training landscapes. Nat. Commun.9, 4812 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Cerezo, M., Sone, A., Volkoff, T., Cincio, L. & Coles, P. J. Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat. Commun.12, 1791 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Wang, S. et al. Noise-induced barren plateaus in variational quantum algorithms. Nat. Commun.12, 6961 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Martín, EnriqueCervero, Plekhanov, K. & Lubasch, M. Barren plateaus in quantum tensor network optimization. Quantum7, 974 (2023). [Google Scholar]
22.Larocca, M. et al. Diagnosing barren plateaus with tools from quantum optimal control. Quantum6, 824 (2022). [Google Scholar]
23.Schatzki, L., Larocca, M., Nguyen, Q. T., Sauvage, F. & Cerezo, M. Theoretical guarantees for permutation-equivariant quantum neural networks. arXivhttp://arxiv.org/abs/2210.09974 (2022).
24.Anschuetz, E. R., Bauer, A., Kiani, B. T. & Lloyd, S. Efficient classical algorithms for simulating symmetric quantum systems. Quantum7, 1189 (2023). [Google Scholar]
25.Terhal, B. M. and DiVincenzo, D. P., Classical simulation of noninteracting-fermion quantum circuits. Phys. Rev. A65, 10.1103/physreva.65.032325 (2002),
26.Somma, R., Barnum, H., Ortiz, G. & Knill, E. Efficient solvability of hamiltonians and limits on the power of some quantum computational models. Phys. Rev. Lett.97, 10.1103/physrevlett.97.190501 (2006), [DOI] [PubMed]
27.Goh, M. L., Larocca, M., Cincio, L., Cerezo, M., and Sauvage, Frédéric. Lie-algebraic classical simulations for variational quantum computing. arXivhttp://arxiv.org/abs/2308.01432 (2023).
28.Russell, B., Rabitz, H. & Wu, R. Quantum control landscapes are almost always trap free. arXiv, http://arxiv.org/abs/1608.06198 (2016).
29.Larocca, Martín, Ju, N., García-Martín, D., Coles, P. J. & Cerezo, M. Theory of overparametrization in quantum neural networks. Nat. Comput. Sci.3, 542–551 (2023). [DOI] [PubMed] [Google Scholar]
30.Kerenidis, I. & Prakash, A. Quantum machine learning with subspace states. arXivhttp://arxiv.org/abs/2202.00054 (2022).
31.Cherrat, El. Amine et al. Quantum deep Hedging. Quantum7, 1191 (2023). [Google Scholar]
32.d’Alessandro, D. Introduction to quantum control and dynamics (CRC press, 2021)
33.Wecker, D., Hastings, M. B. & Troyer, M., Progress towards practical quantum variational algorithms. Phys. Rev. A92, 10.1103/physreva.92.042303 (2015),
34.Hadfield, S. et al. From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms12, 34 (2019). [Google Scholar]
35.Nguyen, Q. T. et al. Theory for equivariant quantum neural networks. arXivhttp://arxiv.org/abs/2210.08566 (2022).
36.Fuchs, J. Affine Lie algebras and quantum groups: an introduction, with applications in conformal field theory (Cambridge University Press, 1995).
37.Haah, J., Liu, Y. & Tan, X. Efficient approximate unitary designs from random pauli rotations. arXivhttps://arxiv.org/abs/2402.05239 (2024).
38.Harrow, A. W. & Low, R. A. Random quantum circuits are approximate 2-designs. Commun. Math. Phys.291, 257–302 (2009). [Google Scholar]
39.Zhang, K., Liu, L., Hsieh, Min-Hsiu & Tao, D. Escaping from the barren plateau via gaussian initializations in deep variational quantum circuits. Adv. Neural Inf. Process. Syst.35, 18612–18627 (2022). [Google Scholar]
40.Volkoff, T. & Coles, P. J. Large gradients via correlation in random parameterized quantum circuits. Quantum Sci. Technol.6, 025008 (2021). [Google Scholar]
41.Hall, B. C. & Hall, B. C. Lie groups, Lie algebras, and representations (Springer, 2013).
42.Wiersema, R., Kökcü, E., Kemper, A. F., and Bakalov, B. N. Classification of dynamical Lie algebras for translation-invariant 2-local spin systems in one dimension. https://arxiv.org/pdf/2309.05690.pdf (2023).
43.Somma, R., Ortiz, G., Barnum, H., Knill, E. & Viola, L. Nature and measure of entanglement in quantum phase transitions. Phys. Rev. A70, 042311 (2004). [Google Scholar]
44.Cherrat, E.A. et al. Quantum vision transformers. arXivhttps://arxiv.org/abs/2209.08167 (2022).
45.Monbroussou, Léo, Landman, J., Grilo, A. B., Kukla, R., and Kashefi, E. Trainability and expressivity of hamming-weight preserving quantum circuits for machine learning. arXivhttps://arxiv.org/abs/2309.15547 (2023).
46.Brod, D. J. Efficient classical simulation of matchgate circuits with generalized inputs and measurements. Phys. Rev. A93, 10.1103/physreva.93.062332 (2016).
47.Cerezo, M., Sone, A., Volkoff, T., Cincio, L. & Coles, P. J. Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat. Commun.12, 10.1038/s41467-021-21728-w (2021). [DOI] [PMC free article] [PubMed]
48.Ragone, M. et al. A Lie algebraic theory of barren plateaus for deep parameterized quantum circuits. 10.1038/s41467-024-49909-3 (2024). [DOI] [PMC free article] [PubMed]
49.Arrasmith, A., Holmes, Zoë, Cerezo, M. & Coles, P. J. Equivalence of quantum barren plateaus to cost concentration and narrow gorges. Quantum Sci. Technol.7, 045015 (2022). [Google Scholar]
50.Anthony W.K. Lie groups beyond an introduction, Theorem II.2.15, Vol.140 (Springer, 1996).
51.Collins, B. & Piotr Sniady, P. Integration with respect to the haar measure on unitary, orthogonal and symplectic group. Commun. Math. Phys. 264, 773–795 (2006).
52.Collins, B., Matsumoto, S. & Novak, J. The Weingarten calculus. arXivhttp://arxiv.org/abs/2109.14890 (2021).
53.Diez, T. & Miaskiwskyi, L. Expectation values of polynomials and moments on general compact lie groups. arXivhttp://arxiv.org/abs/2203.11607 (2022).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(662.1KB, pdf)}

Peer Review File^{(763.3KB, pdf)}

Data Availability Statement

The gradient variance simulation data generated in this study have been deposited in the Zenodo database under accession code 10.5281/zenodo.10720106.

The code used to generate the gradient variance simulation data has been deposited in the Zenodo database under accession code 10.5281/zenodo.10720106.

[CR1] 1.Cerezo, M. et al. Variational quantum algorithms. Nat. Rev. Phys.3, 625–644 (2021). [Google Scholar]

[CR2] 2.Farhi, E., Goldstone, J. & Gutmann, S. A quantum approximate optimization algorithm, arXivhttps://arxiv.org/abs/1411.4028 (2014).

[CR3] 3.Peruzzo, A. et al. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun.5, 4213 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Liu, X. et al. Layer VQE: A variational approach for combinatorial optimization on noisy quantum computers. IEEE Trans. Quantum Eng.3, 1–20 (2022). [Google Scholar]

[CR5] 5.Niroula, P. et al. Constrained quantum optimization for extractive summarization on a trapped-ion quantum computer. Sci. Rep.12, 10.1038/s41598-022-20853-w (2022), [DOI] [PMC free article] [PubMed]

[CR6] 6.Herman, D. et al. Constrained optimization via quantum Zeno dynamics. Commun. Phys.6, 219 (2023).

[CR7] 7.Shaydulin, R. et al. Evidence of scaling advantage for the quantum approximate optimization algorithm on a classically intractable problem. arXivhttps://arxiv.org/abs/2308.02342 (2023). [DOI] [PMC free article] [PubMed]

[CR8] 8.Mitarai, K., Negoro, M., Kitagawa, M. & Fujii, K. Quantum circuit learning. Phys. Rev. A98, 10.1103/physreva.98.032309. (2018).

[CR9] 9.Farhi, E. & Neven, H. Classification with quantum neural networks on near term processors. http://arxiv.org/abs/1802.06002 (2018).

[CR10] 10.Havlíček, Vojtěch et al. Supervised learning with quantum-enhanced feature spaces. Nature567, 209–212 (2019). [DOI] [PubMed] [Google Scholar]

[CR11] 11.Larocca, Martínet al. Group-invariant quantum machine learning. PRX Quantum3, 10.1103/prxquantum.3.030341 (2022),

[CR12] 12.Herman, D. et al. Expressivity of variational quantum machine learning on the boolean cube. IEEE Trans. Quantum Eng.4, 1–18 (2023). [Google Scholar]

[CR13] 13.You, X. & Wu, X. Exponentially many local minima in quantum neural networks. In: International Conference on Machine Learning. pp. 12144–12155 (2021).

[CR14] 14.You, X., Chakrabarti, S., and Wu, X. A convergence theory for over-parameterized variational quantum eigensolvers. http://arxiv.org/abs/2205.12481 (2022).

[CR15] 15.E.R. Anschuetz. Critical points in quantum generative models. In: International conference on learning representations (2021).

[CR16] 16.Anschuetz, E. R. & Kiani, B. T. Quantum variational algorithms are swamped with traps. Nat. Commun.13, 7760 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.You, X., Chakrabarti, S., Chen, B., and Wu, X. Analyzing convergence in quantum neural networks: deviations from neural tangent kernels. In: Proceedings of the 40th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 202, edited by Krause, Andreas, Brunskill, Emma, Cho, Kyunghyun, Engelhardt, Barbara, Sabato, Sivan, and Scarlett, Jonathan (PMLR) pp. 40199–40224 https://proceedings.mlr.press/v202/you23a.html (2023).

[CR18] 18.McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R. & Neven, H. Barren plateaus in quantum neural network training landscapes. Nat. Commun.9, 4812 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Cerezo, M., Sone, A., Volkoff, T., Cincio, L. & Coles, P. J. Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat. Commun.12, 1791 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Wang, S. et al. Noise-induced barren plateaus in variational quantum algorithms. Nat. Commun.12, 6961 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Martín, EnriqueCervero, Plekhanov, K. & Lubasch, M. Barren plateaus in quantum tensor network optimization. Quantum7, 974 (2023). [Google Scholar]

[CR22] 22.Larocca, M. et al. Diagnosing barren plateaus with tools from quantum optimal control. Quantum6, 824 (2022). [Google Scholar]

[CR23] 23.Schatzki, L., Larocca, M., Nguyen, Q. T., Sauvage, F. & Cerezo, M. Theoretical guarantees for permutation-equivariant quantum neural networks. arXivhttp://arxiv.org/abs/2210.09974 (2022).

[CR24] 24.Anschuetz, E. R., Bauer, A., Kiani, B. T. & Lloyd, S. Efficient classical algorithms for simulating symmetric quantum systems. Quantum7, 1189 (2023). [Google Scholar]

[CR25] 25.Terhal, B. M. and DiVincenzo, D. P., Classical simulation of noninteracting-fermion quantum circuits. Phys. Rev. A65, 10.1103/physreva.65.032325 (2002),

[CR26] 26.Somma, R., Barnum, H., Ortiz, G. & Knill, E. Efficient solvability of hamiltonians and limits on the power of some quantum computational models. Phys. Rev. Lett.97, 10.1103/physrevlett.97.190501 (2006), [DOI] [PubMed]

[CR27] 27.Goh, M. L., Larocca, M., Cincio, L., Cerezo, M., and Sauvage, Frédéric. Lie-algebraic classical simulations for variational quantum computing. arXivhttp://arxiv.org/abs/2308.01432 (2023).

[CR28] 28.Russell, B., Rabitz, H. & Wu, R. Quantum control landscapes are almost always trap free. arXiv, http://arxiv.org/abs/1608.06198 (2016).

[CR29] 29.Larocca, Martín, Ju, N., García-Martín, D., Coles, P. J. & Cerezo, M. Theory of overparametrization in quantum neural networks. Nat. Comput. Sci.3, 542–551 (2023). [DOI] [PubMed] [Google Scholar]

[CR30] 30.Kerenidis, I. & Prakash, A. Quantum machine learning with subspace states. arXivhttp://arxiv.org/abs/2202.00054 (2022).

[CR31] 31.Cherrat, El. Amine et al. Quantum deep Hedging. Quantum7, 1191 (2023). [Google Scholar]

[CR32] 32.d’Alessandro, D. Introduction to quantum control and dynamics (CRC press, 2021)

[CR33] 33.Wecker, D., Hastings, M. B. & Troyer, M., Progress towards practical quantum variational algorithms. Phys. Rev. A92, 10.1103/physreva.92.042303 (2015),

[CR34] 34.Hadfield, S. et al. From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms12, 34 (2019). [Google Scholar]

[CR35] 35.Nguyen, Q. T. et al. Theory for equivariant quantum neural networks. arXivhttp://arxiv.org/abs/2210.08566 (2022).

[CR36] 36.Fuchs, J. Affine Lie algebras and quantum groups: an introduction, with applications in conformal field theory (Cambridge University Press, 1995).

[CR37] 37.Haah, J., Liu, Y. & Tan, X. Efficient approximate unitary designs from random pauli rotations. arXivhttps://arxiv.org/abs/2402.05239 (2024).

[CR38] 38.Harrow, A. W. & Low, R. A. Random quantum circuits are approximate 2-designs. Commun. Math. Phys.291, 257–302 (2009). [Google Scholar]

[CR39] 39.Zhang, K., Liu, L., Hsieh, Min-Hsiu & Tao, D. Escaping from the barren plateau via gaussian initializations in deep variational quantum circuits. Adv. Neural Inf. Process. Syst.35, 18612–18627 (2022). [Google Scholar]

[CR40] 40.Volkoff, T. & Coles, P. J. Large gradients via correlation in random parameterized quantum circuits. Quantum Sci. Technol.6, 025008 (2021). [Google Scholar]

[CR41] 41.Hall, B. C. & Hall, B. C. Lie groups, Lie algebras, and representations (Springer, 2013).

[CR42] 42.Wiersema, R., Kökcü, E., Kemper, A. F., and Bakalov, B. N. Classification of dynamical Lie algebras for translation-invariant 2-local spin systems in one dimension. https://arxiv.org/pdf/2309.05690.pdf (2023).

[CR43] 43.Somma, R., Ortiz, G., Barnum, H., Knill, E. & Viola, L. Nature and measure of entanglement in quantum phase transitions. Phys. Rev. A70, 042311 (2004). [Google Scholar]

[CR44] 44.Cherrat, E.A. et al. Quantum vision transformers. arXivhttps://arxiv.org/abs/2209.08167 (2022).

[CR45] 45.Monbroussou, Léo, Landman, J., Grilo, A. B., Kukla, R., and Kashefi, E. Trainability and expressivity of hamming-weight preserving quantum circuits for machine learning. arXivhttps://arxiv.org/abs/2309.15547 (2023).

[CR46] 46.Brod, D. J. Efficient classical simulation of matchgate circuits with generalized inputs and measurements. Phys. Rev. A93, 10.1103/physreva.93.062332 (2016).

[CR47] 47.Cerezo, M., Sone, A., Volkoff, T., Cincio, L. & Coles, P. J. Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat. Commun.12, 10.1038/s41467-021-21728-w (2021). [DOI] [PMC free article] [PubMed]

[CR48] 48.Ragone, M. et al. A Lie algebraic theory of barren plateaus for deep parameterized quantum circuits. 10.1038/s41467-024-49909-3 (2024). [DOI] [PMC free article] [PubMed]

[CR49] 49.Arrasmith, A., Holmes, Zoë, Cerezo, M. & Coles, P. J. Equivalence of quantum barren plateaus to cost concentration and narrow gorges. Quantum Sci. Technol.7, 045015 (2022). [Google Scholar]

[CR50] 50.Anthony W.K. Lie groups beyond an introduction, Theorem II.2.15, Vol.140 (Springer, 1996).

[CR51] 51.Collins, B. & Piotr Sniady, P. Integration with respect to the haar measure on unitary, orthogonal and symplectic group. Commun. Math. Phys. 264, 773–795 (2006).

[CR52] 52.Collins, B., Matsumoto, S. & Novak, J. The Weingarten calculus. arXivhttp://arxiv.org/abs/2109.14890 (2021).

[CR53] 53.Diez, T. & Miaskiwskyi, L. Expectation values of polynomials and moments on general compact lie groups. arXivhttp://arxiv.org/abs/2203.11607 (2022).

PERMALINK

Characterizing barren plateaus in quantum ansätze with the adjoint representation

Enrico Fontana

Dylan Herman

Shouvanik Chakrabarti

Niraj Kumar

Romina Yalovetzky

Jamie Heredge

Shree Hari Sureshbabu

Marco Pistoia

Abstract

Introduction

Results

General framework

Definition 2.1

Definition 2.2

DLA - BP connection

Conjecture 2.3

Definition 2.4

Fig. 1. Illustration of the main result.

Representation theoretic notation

Definition 2.5

Theorem 2.6

Theory of BPs for LASA

Definition 2.7

Theorem 2.8

Theorem 2.9

Interpretation of results

Variance computation for quantum compound Ansatz

Theorem 2.10

Theorem 2.11

Theorem 2.12

Fig. 2. Gradient variance scaling for SU compound layers, observable in Lie algebra.

Corollary 2.12.1

Fig. 3. Gradient variance scaling for SU compound layers, observable not in Lie algebra.

Theorem 2.13

Fig. 4. Gradient variance scaling for SU compound layers, projective measurement.

Comparison with previous approaches

Discussion

Methods

The adjoint representation connection

Proof of theorem 2.8

Lemma 4.1

Proof of theorem 2.8

Proof of theorem 2.9

Lemma 4.2

Proof of theorem 2.9

Proof of theorem 2.11

Proof of theorem 2.11

Proof of theorem 2.12

Proof of corollary 2.12.1

Supplementary information

Acknowledgements

Author contributions

Peer review

Peer review information

Data availability

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases