Skip to main content
NIST Author Manuscripts logoLink to NIST Author Manuscripts
. Author manuscript; available in PMC: 2023 Aug 7.
Published in final edited form as: Quantum (Vienna). 2021 Aug;5:10.22331/q-2021-08-31-533. doi: 10.22331/q-2021-08-31-533

Quantum routing with fast reversals

Aniruddha Bapat 1,4, Andrew M Childs 1,2,3, Alexey V Gorshkov 1,4, Samuel King 5, Eddie Schoute 1,2,3, Hrishee Shastri 6
PMCID: PMC10405739  NIHMSID: NIHMS1918171  PMID: 37551360

Abstract

We present methods for implementing arbitrary permutations of qubits under interaction constraints. Our protocols make use of previous methods for rapidly reversing the order of qubits along a path. Given nearest-neighbor interactions on a path of length n, we show that there exists a constant ϵ0.034 such that the quantum routing time is at most (1-ϵ)n, whereas any SWAP-based protocol needs at least time n-1. This represents the first known quantum advantage over swAP-based routing methods and also gives improved quantum routing times for realistic architectures such as grids. Furthermore, we show that our algorithm approaches a quantum routing time of 2n/3 in expectation for uniformly random permutations, whereas SWAP-based protocols require time n asymptotically. Additionally, we consider sparse permutations that route kn qubits and give algorithms with quantum routing time at most n/3+Ok2 on paths and at most 2r/3+Ok2 on general graphs with radius r.

1. Introduction

Qubit connectivity limits quantum information transfer, which is a fundamental task for quantum computing. While the common model for quantum computation usually assumes all-to-all connectivity, proposals for scalable quantum architectures do not have this capability [MK13; Mon +14; Bre+16]. Instead, quantum devices arrange qubits in a fixed architecture that fits within engineering and design constraints. For example, the architecture may be grid-like [MG19; Aru +19] or consist of a network of submodules [MK13; Mon+14]. Circuits that assume all-to-all qubit connectivity can be mapped onto these architectures via protocols for routing qubits, i.e., permuting them within the architecture using local operations.

Long-distance gates can be implemented using SWAP gates along edges of the graph of available interactions. A typical procedure swaps pairs of distant qubits along edges until they are adjacent, at which point the desired two-qubit gate is applied on the target qubits. These swap subroutines can be sped up by parallelism and careful scheduling [SWD11; SSP13; SSP14; PS16; LWD15; Mur+19; ZW19]. Minimizing the sWaP circuit depth corresponds to the Routing ViA Matchings problem [ACG94; CSU19]. The minimal swAP circuit depth to implement any permutation on a graph G is given by its routing number, rt(G) [ACG94]. Deciding rt(G) is generally NP-hard [BR17], but there exist algorithms for architectures of interest such as grids and other graph products [ACG94; Zha99; CSU19]. Furthermore, one can establish lower bounds on the routing number as a function of graph diameter and other properties.

Routing using SWAP gates does not necessarily give minimal circuit evolution time since it is effectively classical and does not make use of the full power of quantum operations. Indeed, faster protocols are already known for specific permutations in specific qubit geometries such as the path [Rau05; Bap +20]. These protocols tend to be carefully engineered and do not generalize readily to other permutations, leaving open the general question of devising faster-than-SWAP quantum routing. In this paper, we give a positive answer to this question.

Following [Rau05; Bap+20], we consider a continuous-time model of routing, where the protocol is defined by a Hamiltonian that can only include nearest-neighbor interactions. To make consistent comparisons with a gate-based model of routing, we bound the spectral norm of interactions [Bap+20] so that a SWAP gate takes unit time [VHC02], as determined by the canonical form of a two-qubit Hamiltonian [Ben+02]. We suppose that single-qubit operations can be performed arbitrarily fast, a common assumption [VHC02; Ben+02] that is practically well-motivated due to the relative ease of implementing single-qubit rotations.

Rather than directly engineering a quantum routing protocol, we consider a hybrid strategy that leverages a known protocol for quickly performing a specific permutation to implement general quantum routing. Specifically, we consider the reversal operation

ρ:=k=1n2SWAPk,n+1-k (1)

that swaps the positions of qubits about the center of a length-n path. Fast quantum reversal protocols are known in the gate-based [Rau05] and time-independent Hamiltonian [Bap+20] settings. The reversal operation can be implemented in time [Bap+20]

Tρ(n+1)2-p(n)3n+13, (2)

where p(n){0,1} is the parity of n. Both protocols exhibit an asymptotic time scaling of n/3+O(1), which is asymptotically three times faster than the best possible swaP-based time of n-1 (bounded by the diameter of the graph) [ACG94]. The odd-even sort algorithm provides a nearly tight time upper bound of n [LDM84] and will be our main point of comparison.

The Hamiltonian protocol of [Bap+20] can be understood by looking at the time evolution of the site Majorana operators obtained by a Jordan-Wigner transformation of the spin chain. In this picture, the protocol can be interpreted as the rotation of a fictitious particle of spin n+1/2 whose magnetization components are in one-to-one correspondence with the Majoranas on the chain. A reversal corresponds to a rotation of the large spin by an angle of π. The gate-based reversal protocol [Rau05] is a special case of a quantum cellular automaton with a transition function given by the (n+1)-fold product of nearest-neighbor controlled-Z (CZ) operations—an operation that can be done 3 times faster than a SWAP gate—and Hadamard operations. In an open spin chain, this process spreads out local Pauli observables at site i over the chain and “refocuses” them at site n+1-i in n+1 steps for every i. The ability to spread local observables (which is present in the gate-based and Hamiltonian protocols but not in SWAP-based protocols) may be key to obtaining a speedup over SWAP-based algorithms.

We expect both the gate-based and Hamiltonian protocols to be implementable on near-term quantum devices. The gate-based protocol uses nearest-neighbor CZ gates and Hadamard gates, both of which are widely used on existing quantum platforms. The Hamiltonian protocol involves nearest-neighbor Pauli XX interactions with non-uniform couplings, which is within the capabilities of, e.g., superconducting architectures [Kja+20].

Routing using reversals has been studied extensively due to its applications in comparative genomics (where it is known as sorting by reversals) [BP93; KS95]. References [Ben+08; PS02; NNN05] present routing algorithms where, much like in our case, reversals have length-weighted costs. However, these models assume reversals are performed sequentially, while we assume independent reversals can be performed in parallel, where the total cost is given by the evolution time, akin to circuit depth. To our knowledge, results from the sequential case are not easily adaptable to the parallel setting and require a different approach.

Routing on paths is a fundamental building block for routing on more general graphs. For example, a two-dimensional grid graph is the Cartesian product of two path graphs, and the best known routing routine applies a path routing subroutine 3 times [ACG94]. A quantum protocol for routing on the path in time cn, for a constant c>0, would imply a routing time of 3cn on the grid. A similar speedup follows for higher-dimensional grids. More generally, routing algorithms for the generalized hierarchical product of graphs can take advantage of faster routing of the path base graph [CSU19]. For other graphs, it is open whether fast reversals can be used to give faster routing protocols for general permutations.

In the rest of this paper, we present the following results on quantum routing using fast reversals. In Section 2, we give basic examples of using fast reversals to perform routing on general graphs to indicate the extent of possible speedup over swAP-based routing, namely a graph for which routing can be sped up by a factor of 3, and another for which no speedup is possible. Section 3 presents algorithms for routing sparse permutations, where few qubits are routed, both for paths and for more general graphs. Here, we obtain the full factor 3 speedup over swAP-based routing. Then, in Section 4, we prove the main result that there is a quantum routing algorithm for the path with worst-case constant-factor advantage over any swAP-based routing scheme. Finally, in Section 5, we show that our algorithm has average-case routing time 2n/3+o(n) and any swAP-based protocol has average-case routing time at least n-o(n).

2. Simple bounds on routing using reversals

Given the ability to implement a fast reversal ρ with cost given by Eq. (2), the largest possible asymptotic speedup of reversal-based routing over swAP-based routing is a factor of 3. This is because the reversal operation, which is a particular permutation, cannot be performed faster than n/3+o(n), and can be performed in time n classically using odd-even sort. As we now show, some graphs can saturate the factor of 3 speedup for general permutations, while other graphs do not admit any speedup over swaps

Maximal speedup:

For n odd, let Kn* denote two complete graphs, each on (n+1)/2 vertices, joined at a single “junction” vertex for a total of n vertices (Figure 1a). Consider a permutation on Kn* in which every vertex is sent to the other complete subgraph, except that the junction vertex is sent to itself. To route with SWAPs, note that each vertex (other than that at the junction) must be moved to the junction at least once, and only one vertex can be moved there at any time. Because there are (n+1)/2-1 non-junction vertices on each subgraph, implementing this permutation requires a SWAP-circuit depth of at least n-1.

Figure 1:

Figure 1:

K9* admits the full factor of 3 speedup in the worst case when using reversals over swAPs, whereas K5 admits no speedup when using reversals over SWAPS.

On the other hand, any permutation on Kn* can be implemented in time n/3+O(1) using reversals. First, perform a reversal on a path that connects all vertices with opposite-side destinations. After this reversal, every vertex is on the side of its destination and the remainder can be routed in at most 2 steps [ACG94]. The total time is at most (n+1)/3+2, exhibiting the maximal speedup by an asymptotic factor of 3.

No speedup:

Now, consider the complete graph on n vertices, Kn (Figure 1b). Every permutation on Kn can be routed in at most time 2 using swAPs [ACG94]. Consider implementing a 3-cycle on three vertices of Kn for n3 using reversals. Any reversal sequence that implements this permutation will take at least time 2. Therefore, no speedup is gained over swAPs in the worst case.

We have shown that there exists a family of graphs that allows a factor of 3 speedup for any permutation when using fast reversals instead of SWAPs, and others where reversals do not grant any improvement. The question remains as to where the path graph lies on this spectrum. Faster routing on the path is especially desirable since this task is fundamental for routing in more complex graphs.

3. An algorithm for sparse permutations

We now consider routing sparse permutations, where only a small number k of qubits are to be moved. For the path, we show that the routing time is at most n/3+Ok2. More generally, we show that for a graph of radius r, the routing time is at most 2r/3+Ok2. (Recall that the radius of a graph G=(V,E) is minuV.maxvV.dist(u,v), where dist(u,v) is the distance between u and v in G.) Our approach to routing sparse permutations using reversals is based on the idea of bringing all k qubits to be permuted to the center of the graph, rearranging them, and then sending them to their respective destinations.

graphic file with name nihms-1918171-f0007.jpg

3.1. Paths

A description of the algorithm on the path, called MiddleExchange, appears in Algorithm 3.1. Figure 2 presents an example of MiddleExchange for k=6.

Figure 2:

Figure 2:

Example of MiddleExchange (Algorithm 3.1) on the path for k=6.

In Theorem 3.1, we prove that Algorithm 3.1 achieves a routing time of asymptotically n/3 when implementing a sparse permutation of k=o(n) qubits on the path graph. First, let 𝒮n denote the set of permutations on {1,,n}, so 𝒮n=n !. Then, for any permutation π𝒮n that acts on a set of labels {1,,n}, let πi denote the destination of label i under π. We may then write π=π1,π2,,πn. Let ρ denote an ordered series of reversals ρ1,,ρm, and let ρ¯1ρ¯2 be the concatenation of two reversal series. Finally, let S·ρ and S·ρ denote the result of applying ρ and ρ to a sequence S, respectively, and let |ρ| denote the length of the reversal ρ, i.e., the number of vertices it acts on.

Theorem 3.1.

Let π𝒮n with k=x[n]πxx (i.e., k elements are to be permuted, and n-k elements begin at their destination). Then Algorithm 3.1 routes π in time at most n/3+Ok2.

Proof.

Algorithm 3.1 consists of three steps: compression (Line 4–Line 9), inner permutation (Line 11), and dilation (Line 12). Notice that compression and dilation are inverses of each other.

Let us first show that Algorithm 3.1 routes π correctly. Just as in the algorithm, let x1,,xk denote the labels x[n] with xi<xi+1 such that πxx, that is, the elements that do not begin at their destination and need to be permuted. It is easy to see that these elements are permuted correctly: After compression, the inner permutation step routes xi to the current location of the label πxi in the middle. Because dilation is the inverse of compression, it will then route every xi to its correct destination. For the non-permuting labels, notice that they lie in the support of either no reversal or exactly two reversals, ρ1 in the compression step and ρ2 in the dilation step. Therefore ρ1 reverses the segment containing the label and ρ2 re-reverses it back into place (so ρ1=ρ2). Therefore, the labels that are not to be permuted end up exactly where they started once the algorithm is complete.

Now we analyze the routing time. Let di=xi+1-xi-1 for i[k-1]. As in the algorithm, let t be the largest index for which xtn/2. Then, for 1it-1, we have ρi=di+i, and, for t+2jk, we have ρj=dj-1+k-j. Moreover, we have ρt=n/2-xt-1+t and ρt+1=xt+1-n/2+k-t. From all reversals in the first part of Algorithm 3.1, ρ, consider those that are performed on the left side of the median (position n/2 of the path). The routing time of these reversals is

13i=1t|ρi|+1=13(n/2xt1)+13i=1t(di+i+1)=t(t+1)6+13(n/2xt1)+i=1t(xi+1xi)=O(t2)+13(n/2x1)n6+O(k2). (3)

By a symmetric argument, the same bound holds for the compression step on the right half of the median. Because both sides can be performed in parallel, the total cost for the compression step is at most n/6+Ok2. The inner permutation step can be done in time at most k using odd-even sort. The cost to perform the dilation step is also at most n/6+Ok2 because dilation is the inverse of compression. Thus, the total routing time for Algorithm 3.1 is at most 2n/6+Ok2+k=n/3+Ok2. □

It follows that sparse permutations on the path with k=o(n) can be implemented using reversals with a full asymptotic factor of 3 speedup.

3.2. General graphs

We now present a more general result for implementing sparse permutations on an arbitrary graph.

Theorem 3.2.

Let G=(V,E) be a graph with radius r and π a permutation of vertices. Let S=vV:πvv. Then π can be routed in time at most 2r/3+O(|S|2).

Proof.

We route π using a procedure similar to Algorithm 3.1, consisting of the same three steps adapted to work on a spanning tree of G: compression, inner permutation, and dilation. Dilation is the inverse of compression and the inner permutation step can be performed on a subtree consisting of just k=|S| nodes by using the Routing ViA MATCHings algorithm for trees in 3k/2+O(logk) time [Zha99]. It remains to show that compression can be performed in r/3+Ok2 time.

We construct a token tree 𝒯 that reduces the compression step to routing on a tree. Let c be a vertex in the center of G, i.e., a vertex with distance at most r to all vertices. Construct a shortest-path tree 𝒯 of G rooted at c, say, using breadth-first search. We assign a token to each vertex in S. Now 𝒯 is the subtree of 𝒯 formed by removing all vertices vV𝒯 for which the subtree rooted at v does not contain any tokens, as depicted in Figure 3. In 𝒯, call the first common vertex between paths to c from two distinct tokens an intersection vertex, and let be the set of all intersection vertices. Note that if a token t1 lies on the path from another token t2 to c, then the vertex on which t1 lies is also an intersection vertex. Since 𝒯 has at most k leaves, ||k-1.

Figure 3:

Figure 3:

Illustration of the token tree 𝒯 in Theorem 3.2 for a case where G is the 5 × 5 grid graph. Blue circles represent vertices in S and orange circles represent vertices not in S. Vertex c denotes the center of G. Red-outlined circles represent intersection vertices. In particular, note that one of the blue vertices is an intersection because it is the first common vertex on the path to c of two distinct blue vertices.

For any vertex v in 𝒯, let the descendants of v be the vertices uv in 𝒯 whose path on 𝒯 to c includes v. Now let 𝒯v be the subtree of 𝒯 rooted at v, i.e., the tree composed of v and all of the descendants of v. We say that all tokens have been moved up to a vertex v if for all vertices u in 𝒯v without a token, Tu also does not contain a token. The compression step can then be described as moving tokens up to c.

graphic file with name nihms-1918171-f0008.jpg

We describe a recursive algorithm for doing so in Algorithm 3.2. The base case considers the trivial case of a subtree with only one token. Otherwise, we move all tokens on the subtrees of descendant b up to the closest intersection w using recursive calls as illustrated in Figure 4. Afterwards, we need to consider whether the path p between v and w has enough room to store all tokens. If it does, we use a Routing via Matchings algorithm for trees to route tokens from w onto p, followed by a reversal to move these tokens up to v. Otherwise, the path is short enough to move all tokens up to v by the same Routing via Matchings algorithm.

Figure 4:

Figure 4:

An example of moving the m tokens in 𝒯w up to b (Line 14-Line 18 in Algorithm 3.2).

We now bound the routing time on 𝒯w1 of MoveUpTo w1, for any vertex w1V(𝒯). First note that all operations on subtrees 𝒯b of 𝒯w1 are independent and can be performed in parallel. Let w1,w2,,wt be the sequence of intersection vertices that MoveUpTo(·) is recursively called on that dominates the routing time of MoveUpTo w1. Let dw, for wV𝒯w1, be the distance of w to the furthest leaf node in 𝒯w. Assuming that the base case on Line 2 has not been reached, we have a routing time of

Tw1Tw2+dw1-dw23+O(k), (4)

where O(k) bounds the time required to route mk tokens on a tree of size at most 2m following the recursive MoveUpTo w2 call [Zha99]. We expand the time cost Twi of recursive calls until we reach the base case of wt to obtain

T(v)T(wt)+i=1t1(dwidwi+13+O(k))=T(wt)+dw1dwt3+t·O(k)dw13+(t+1)O(k). (5)

Since dvr and tk, this shows that compression can be performed in r/3+Ok2 time. □

In general, a graph with radius r and diameter d will have d/2rd. Using Theorem 3.2, this implies that for a graph G and a sparse permutation with k=o(r), the bound for the routing time will be between d/3+o(d) and 2d/3+o(d). Thus, for such sparse permutations, using reversals will always asymptotically give us a constant-factor worst-case speedup over any swAP-only protocol since rt(G)d. Furthermore, for graphs with r=d/2, we can asymptotically achieve the full factor of 3 speedup.

graphic file with name nihms-1918171-f0009.jpg

4. Algorithms for routing on the path

Our general approach to implementing permutations on the path relies on the divide-and-conquer strategy described in Algorithm 4.1. It uses a correspondence between implementing permutations and sorting binary strings, where the former can be performed at twice the cost of the latter. This approach is inspired by [PS02] and [Ben+08] who use the same method for routing by reversals in the sequential case.

First, we introduce a binary labeling using the indicator function

I(v)=0ifv<n/2,1otherwise. (6)

This function labels any permutation π𝒮n by a binary string I(π):=Iπ1,Iπ2,,Iπn. Let π be the target permutation, and σ any permutation such that Iπσ-1=(0n/21n/2). Then it follows that σ divides π into permutations πL, πR acting only on the left and right halves of the path, respectively, i.e., π=πL·πR·σ. We find and implement σ via a binary sorting subroutine, thereby reducing the problem into two subproblems of length at most n/2 that can be solved in parallel on disjoint sections of the path. Proceeding by recursion until all subproblems are on sections of length at most 1, the only possible permutation is the identity and π has been implemented. Because disjoint permutations are implemented in parallel, the total routing time is T(π)=T(σ)+maxTπL,TπR.

We illustrate Algorithm 4.1 with an example, where the binary labels are indicated below the corresponding destination indices:

4. (7)

Each labeling and sorting step corresponds to an application of Eq. (6) and BinarySorter, respectively, to each subproblem. Specifically, in Eq. (7), we use TBS (Algorithm 4.2) to sort binary strings.

graphic file with name nihms-1918171-f0010.jpg

We present two algorithms for BinarySorter, which perform the work in our sorting algorithm. The first of these binary sorting subroutines is Tripartite Binary Sort (TBS, Algorithm 4.2). TBS works by splitting the binary string into nearly equal (contiguous) thirds, recursively sorting these thirds, and merging the three sorted thirds into one sorted sequence. We sort the outer thirds forwards and the middle third backwards which allows us to merge the three segments using at most one reversal. For example, we can sort a binary string as follows:

4. (8)

where the arrows with TBS indicate recursive calls to TBS and the bracket indicates the reversal to merge the segments. Let GDC (TBS) denote Algorithm 4.1 when using TBS to sort binary strings, where GDC stands for GenericDivideConquer.

The second algorithm is an adaptive version of TBS (Algorithm 4.3) that, instead of using equal thirds, adaptively chooses the segments’ length. Adaptive TBS considers every pair of partition points, 0ij<n-1, that would split the binary sequence into two or three sections: B[0,i], B[i+1,j], and B[j+1,n-1] (where i=j corresponds to no middle section). For each pair, it calculates the minimum cost to recursively sort the sequence using these partition points. Since each section can be sorted in parallel, the total sorting time depends on the maximum time needed to sort one of the three sections and the cost of the final merging reversal. Let GDC(ATBS) denote Algorithm 4.1 when using Adaptive TBS to sort binary strings.

Notice that the partition points selected by TBS are considered by the Adaptive TBS algorithm and are selected by Adaptive TBS only if no other pair of partition points yields a faster sorting time. Thus, for any permutation, the sequence of reversals found by Adaptive TBS costs no more than that found by TBS. However, TBS is simpler to implement and will be faster than Adaptive TBS in finding the sorting sequence of reversals.

graphic file with name nihms-1918171-f0011.jpg

4.1. Worst-case bounds

In this section, we prove that all permutations of sufficiently large length n can be sorted in time strictly less than n using reversals. Let nx(b) denote the number of times character x{0,1} appears in a binary string b, and let T(b) (resp., T(π)) denote the best possible sorting time to sort b (resp., implement π) with reversals. Assume all logarithms are base 2 unless specified otherwise.

Lemma 4.1.

Let b{0,1}n such that nx(b)<cn+O(logn), where c[0,1/3] and x{0,1}. Then, T(b)(c/3+7/18)n+O(logn).

Proof.

To achieve this upper bound, we use TBS (Algorithm 4.2). There are log3n steps in the recursion, which we index by j0,1,,log3n, with step 0 corresponding to the final merging step. Let ρj denote the size of the longest reversal in recursive step j that merges the three sorted subsequences of size n/3j+1. The size of the final merging reversal ρ0 can be bounded above by (c+2/3)n+O(logn) because ρ0 is maximized when every x is contained in the leftmost third if x=1 or the rightmost third if x=0. So we have

T(b)(j=0log3n|ρj|3)+O(logn)(c3+29)n+O(logn)+(j=1log3n|ρj|3)+O(logn) (9)
c3+718n+Ologn, (10)

where we used ρjn/3j for j1. □

Now we can prove a bound on the cost of a sorting series found by Adaptive TBS for any binary string of length n.

Theorem 4.2.

For all bit strings b{0,1}n of arbitrary length N, Tb12-εn+O(logn)0.483n+O(logn), where ε=1/3-1/10.

Proof.

Let b{0,1}n for some nN. Partition b into three sections b=b1b2b3 such that b1=b3=n/3 and b2=n-2n/3. Since n/3=n/3-d where d{0,1/3,2/3}, we write b1=b2=b3=n/3+O(1) for the purposes of this proof. Recall that if segments b1 and b3 are sorted forwards and segment b2 is sorted backwards, the resulting segment can be sorted using a single reversal, ρ (see the example in Eq. (8)). Then we have

T(b)maxTb1,Tb2,Tb3+|ρ|+13, (11)

where Tb2 is the time to sort b2 backwards using reversals.

We proceed by induction on n. For the base case, it suffices to note that every binary string can be sorted using reversals and, for finitely many values of nN, any time needed to sort a binary string of length n exceeding (1/2-ε)n can be absorbed into the O(logn) term. Now assume T(b)(1/2-ε)k+O(logk) for all <n, b{0,1}k.

Case 1:

n0b12εn or n1b32εn. In this case, |ρ|n-2εn, so

T(b)n-2εn+13+maxTb1,Tb2,Tb312-εn+O(logn) (12)

by the induction hypothesis.

Case 2:

n0b1<2εn and n1b3<2εn. In this case, adjust the partition such that b1=b3=n/3+2εn/(3-6ε)-O(1) and consequently b2=n/3-4εn/(3-6ε)+O(1), as depicted in Figure 5. In this adjustment, at most 2εn/(3-6ε) zeros are added to the segment b1 and likewise with ones to b3. Thus, n1b32εn+2εn/(3-6ε)=(1+1/(3-6ε))2εn. Since n=(3-6ε)b1-O(1), we have

n1b31+13-6ε2ε(3-6ε)b1-O(1)=(2-3ε)4εb1-O(1). (13)

Let c=(2-3ε)4ε=2/15. Applying Lemma 4.1 with this value of c yields

Tb3245+718b1+Ologb1=110-16n+O(logn). (14)

Since b1=b3, we obtain the same bound Tb1(1/10-1/6)n+O(logn) by applying Lemma 4.1 with the same value of c.

Figure 5:

Figure 5:

Case 2 of Theorem 4.2. If there are few zeros and ones in the leftmost and rightmost thirds, respectively, we can shorten the middle section so that it can be sorted quickly. Then, because each of the outer thirds contain far more zeros than ones (or vice versa), they can both can be sorted quickly as well.

By the inductive hypothesis, Tb2 can be bounded above by

Tb212-εn3-4ε3-6εn+O(1)+O(logn)=110-16n+O(logn). (15)

Using Eq. (11) and the fact that |ρ|n, we get the bound

T(b)110-16n+O(logn)+n+13=12-εn+O(logn)

as claimed. □

This bound on the cost of a sorting series found by Adaptive TBS for binary sequences can easily be extended to a bound on the minimum sorting sequence for any permutation of length n.

Corollary 4.3.

For a length-n permutation π,T(π)(1/3+2/5)n+Olog2n0.9658n+Olog2n.

Proof.

To sort π, we turn it into a binary string b using Eq. (6). Then let ρ1,ρ2,,ρm be a sequence of reversals to sort b. If we apply the sequence to get π=πρ1ρ2ρm, every element of π will be on the same half as its destination. We can then recursively perform the same procedure on each half of π, continuing down until every pair of elements has been sorted.

This process requires logn steps, and at step i, there are 2i binary strings of length n2i being sorted in parallel. This gives us the following bound to implement π:

T(π)i=0lognT(bi), (16)

where bi{0,1}n/2i. Applying the bound from Theorem 4.2, we obtain

T(π)i=0lognT(bi)i=0logn((16+110)n2i+O(log(n/2i)))=(13+25)n+O(log2n).

5. Average-case performance

So far we have presented worst-case bounds that provide a theoretical guarantee on the speedup of quantum routing over classical routing. However, the bounds are not known to be tight, and may not accurately capture the performance of the algorithm in practice.

In this section we show better performance for the average-case routing time, the expected routing time of the algorithm on a permutation chosen uniformly at random from 𝒮n. We present both theoretical and numerical results on the average routing time of swap-based routing (such as odd-even sort) and quantum routing using TBS and ATBS. We show that on average, GDC (TBS) (and GDC (ATBS), whose sorting time on any instance is at least as fast) beats swap-based routing by a constant factor 2/3. We have the following two theorems, whose proofs can be found in Appendices A and B, respectively.

Theorem 5.1.

The average routing time of any SWAP-based procedure is lower bounded by n-o(n).

Theorem 5.2.

The average routing time of GDC(TBS) is 2n/3+Onα for a constant α12,1.

These theorems provide average-case guarantees, yet do not give information about the non-asymptotic behavior. Therefore, we test our algorithms on random permutations for instances of intermediate size.

Our numerics [KSS21] show that Algorithm 4.1 has an average routing time that is well-approximated by c·n+o(n), where 2/3c<1, using TBS or Adaptive TBS as the binary sorting subroutine, for permutations generated uniformly at random. Similarly, the performance of odd-even sort (OES) is well-approximated by n+o(n). Furthermore, the advantage of quantum routing is evident even for fairly short paths. We demonstrate this by sampling 1000 permutations uniformly from 𝒮n for n[12,512], and running OES and GDC(TBS) on each permutation. Due to computational constraints, GDC(ATBS) was run on sample permutations for lengths n[12,206]. On an Intel i7–6700HQ processor with a clock speed of 2.60 GHz, OES took about 0.04 seconds to implement each permutation of length 512; GDC(TBS) took about 0.3 seconds; and, for permutations of length 200, GDC(ATBS) took about 6 seconds.

The results of our experiments are summarized in Figure 6. We find that the mean normalized time costs for OES, GDC(TBS), and GDC(ATBS) are similar for small n, but the latter two decrease steadily as the lengths of the permutations increase while the former steadily increases. Furthermore, the average costs for GDC(TBS) and GDC(ATBS) diverge from that of OES rather quickly, suggesting that GDC(TBS) and GDC(ATBS) perform better on average for somewhat small permutations (n50) as well as asymptotically.

Figure 6:

Figure 6:

The mean routing time and fit of the mean routing time for odd-even sort (OES), and routing algorithms using Tripartite Binary Sort (GDC(TBS)) and Adaptive TBS (GDC (ATBS)). We exhaustively search for n<12 and sample 1000 permutations uniformly at random otherwise. We show data for GDC(ATBS) only for n207 because it becomes too slow after that point. We find that the fit function μn=an+bn+c fits the data with an R2>99.99% (all three algorithms). For OES, the fit gives a0.9999; for GDC(TBS), a0.6599; and for GDC(ATBS), a0.6513. Similarly, for the standard deviation, we find that the fit function σn2=an+bn+c fits the data with R299% (all three algorithms), suggesting that the normalized deviation of the performance about mean scales as σn/n=Θn-0.5 asymptotically.

The linear coefficient a of the fit of μn for OES is a0.99991, which is consistent with the asymptotic bound proven in Theorems 5.1 and 5.2. For the fit of the mean time costs for GDC(TBS) and GDC(ATBS), we have a0.6599 and a0.6513 respectively. The numerics suggest that the algorithm routing times agree with our analytics, and are fast for instances of realistic size. For example, at n=100, GDC(TBS) and GDC(ATBS) have routing times of 0.75n and 0.72n, respectively. On the other hand, OES routes in average time >0.9n. For larger instances, the speedup approaches the full factor of 2/3 monotonically. Moreover, the fits of the standard deviations suggest σn/n=Θ(1/n) asymptotically, which implies that as permutation length increases, the distribution of routing times gets relatively tighter for all three algorithms. This suggests that the average-case routing time may indeed be representative of typical performance for our algorithms for permutations selected uniformly at random.

6. Conclusion

We have shown that our algorithm, GDC(ATBS) (i.e., Generic Divide-and-Conquer with Adaptive TBS to sort binary strings), uses the fast state reversal primitive to outperform any sWAP-based protocol when routing on the path in the worst and average case. Recent work shows a lower bound on the time to perform a reversal on the path graph of n/α, where α4.5 [Bap+20]. Thus we know that the routing time cannot be improved by more than a factor α over SwAPs, even with new techniques for implementing reversals. However, it remains to understand the fastest possible routing time on the path. Clearly, this is also lower bounded by n/α. Our work could be improved by addressing the following two open questions: (i) how fast can state reversal be implemented, and (ii) what is the fastest way of implementing a general permutation using state reversal?

We believe that the upper bound in Corollary 4.3 can likely be decreased. For example, in the proof of Lemma 4.1, we use a simple bound to show that the reversal sequence found by GDC(TBS) sorts binary strings with fewer than cn ones sufficiently fast for our purposes. It is possible that this bound can be decreased if we consider the reversal sequence found by GDC(ATBS) instead. Additionally, in the proof of Theorem 4.2, we only consider two pairs of partition points: one pair in each case of the proof. This suggests that the bound in Theorem 4.2 might be decreased if the full power of GDC(ATBS) could be analyzed.

Improving the algorithm itself is also a potential avenue to decrease the upper bound in Corollary 4.3. For example, the generic divide-and-conquer approach in Algorithm 4.1 focused on splitting the path exactly in half and recursing. An obvious improvement would be to create an adaptive version of Algorithm 4.1 in a manner similar to GDC(ATBS) where instead of splitting the path in half, the partition point would be placed in the optimal spot. It is also possible that by going beyond the divide-and-conquer approach, we could find faster reversal sequences and reduce the upper bound even further.

Our algorithm uses reversals to show the first quantum speedup for unitary quantum routing. It would be interesting to find other ways of implementing fast quantum routing that are not necessarily based on reversals. Other primitives for rapidly routing quantum information might be combined with classical strategies to develop fast general-purpose routing algorithms, possibly with an asymptotic scaling advantage. Such primitives might also take advantage of other resources, such as long-range Hamiltonians or the assistance of entanglement and fast classical communication.

Acknowledgements

We thank William Gasarch for organizing the REU-CAAR program that made this project possible.

A.B. and A.V.G. acknowledge support by the DoE ASCR Quantum Testbed Pathfinder program (award number de-sc0019040), ARO MURI, DoE ASCR Accelerated Research in Quantum Computing program (award number de-sc0020312), U.S. Department of Energy award number de-sc0019449, NSF PFCQC program, AFOSR, and AFOSR MURI. A.M.C. and E.S. acknowledge support by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Quantum Testbed Pathfinder program (award number de-sc0019040) and the U.S. Army Research Office (MURI award number W911NF-16–1-0349). S.K. and H.S. acknowledge support from an NSF REU grant, REU-CAAR (CNS-1952352). E.S. acknowledges support from an IBM Ph.D. Fellowship.

A. Average routing time using only SWAPs

In this section, we prove Theorem 5.1. First, define the infinity distance d:𝒮nN to be d(π)=max1inπi-i. Note that 0d(π)n-1. Finally, define the set of permutations of length n with infinity distance at most k to be Bk,n=π𝒮n:d(π)k.

The infinity distance is crucially tied to the performance of odd-even sort, and indeed, any SWAP-based routing algorithm. For any permutation π of length n, the routing time of any SWAP-based algorithm is bounded below by d(π), since the element furthest from its destination must be swapped at least d(π) times, and each of those swAPs must occur sequentially. To show that the average routing time of any SWAP-based protocol is asymptotically at least n, we first show that B(1-ε)n,n/n!0 for all 0<ε1/2.

Schwartz and Vontobel [SV17] present an upper bound on Bk,n that was proved in [Klø08] and [TS10]:

Lemma A.1.

For all <r<1, Brn,nΦ(rn,n), where

Φ(k,n)=((2k+1)!)n-2k2k+1i=k+12k(i!)2/iif0<k/n12(n!)2k+2-nni=k+1n-1(i!)2/iif12k/n<1. (17)

Proof.

Note that r=k/n. For the case of 0<r1/2, refer to [Klø08] for a proof. For the case of 1/2r<1, refer to [TS10] for a proof. □

Lemma A.2.

n!=Θnnen (18)

Proof.

This follows from well-known precise bounds for Stirling’s formula:

2πnnene112n+1n!2πnnene112n (19)
2πnnenn!2πnnene (20)

(see for example [Rob55]). □

With Lemmas A.1 and A.2 in hand, we proceed with the following theorem:

Theorem A.3.

For all <ε1/2, limnB(1-ε)n,n/n!=0. In other words, the proportion of permutations of length n with infinity distance less than (1-ϵ)n vanishes asymptotically.

Proof.

Lemma A.1 implies that B1-εn,n/n!Φ((1-ε)n,n)/n!. The constraint 0<ε1/2 stipulates that we are in the regime where 1/2r<1, since r=1-ε. Then we use Lemma A.2 to simplify any factorials that appear. Substituting Eq. (17) and simplifying, we have

Φ((1ε)n,n)n!=i=(1ε)n+1n1(i!)2/i(n!)2ε2/n=O(e2εn2n2εn2i=(1ε)n+1n1i2+1/ie2). (21)

We note that i1/i terms can be bounded by

i=(1ε)n+1n1i1ii=(1ε)n+1n1n1(1ε)nnε1εn (22)

since ε1/2. Now we have

O(e2εn2n2εn2i=(1ε)n+1n1i2+1/ie2)=O(nn2εn2i=(1ε)n+1n1i2) (23)
=O(nn2εn2((n1)!((1ε)n+1)!)2) (24)
=O(nn2εn2e2εn(n1)2n1((1ε)n+1)2(1ε)n+2) (25)
=O(nn2εn2e2εnn2n((1ε)n)2(1ε)n) (26)
=O(n3exp((ln(1ε)(1ε)+ε)2n)). (27)

Since ln(1-ε)(1-ε)+ε>0 for ε>0, this vanishes in the limit of large n. □

Now we prove the theorem.

Proof of Theorem 5.1.

Let T denote the average routing time of any SWAP-based protocol. Consider a random permutation π drawn uniformly from 𝒮n. Due to Theorem A.3, π will belong in B(1-ε)n,n with vanishing probability, for all 0<ε1/2. Therefore, for any fixed 0<ε1/2 as n, (1-ε)n<Ed(π). This translates to an average routing time of at least n-o(n) because we have, asymptotically, (1-ε)nT for all such ε. □

B. Average routing time using TBS

In this section, we prove Theorem 5.2, which characterizes the average-case performance of TBS (Algorithm 4.2). This approach consists of two steps: a recursive call on three equal partitions of the path (of length n/3 each), and a merge step involving a single reversal.

We denote the uniform distribution over a set S as 𝒰(S). The set of all n-bit strings is denoted Bn, where B={0,1}. Similarly, the set of all n-bit strings with Hamming weight k is denoted Bkn. For simplicity, assume that n is even. We denote the runtime of TBS on bBn by T(b).

When running GDC(TBS) on a given permutation π, the input bit string for TBS is b=I(π), where the indicator function I is defined in Eq. (6). We wish to show that, in expectation over all permutations π, the corresponding bit strings are quick to sort. First, we show that it suffices to consider uniformly random sequences from Bn/2n.

Lemma B.1.

If π𝒰𝒮n, then I(π)𝒰(Bn/2n).

Proof.

We use a counting argument. The number of permutations π such that I(π)Bn/2n is (n/2)!(n/2)!, since we can freely assign index labels from {1,2,,n/2} to the 0 bits of I(π), and from {n/2+1,,n} to the 1 bits of I(π). Therefore, for a uniformly random π and arbitrary bBn/2n,

Pr(I(π)=b)=(n/2)!(n/2)!n!=1nn/2=1Bn/2n. (28)

Therefore, I(π)𝒰(Bn/2n). □

While Bn/2n is easier to work with than 𝒮n, the constraint on the Hamming weight still poses an issue when we try to analyze the runtime recursively. To address this, Lemma B.2 below shows that relaxing from 𝒰(Bn/2n) to 𝒰Bn does not affect expectation values significantly.

We give a recursive form for the runtime of TBS. We use the following convention for the substrings of an arbitrary n-bit string a: if a is divided into 3 segments, we label the segments a0.0,a0.1,a0.2 from left to right. Subsequent thirds are labeled analogously by ternary fractions. For example, the leftmost third of the middle third is denoted a0.10, and so on. Then, the runtime of TBS on string a can be bounded by

T(a)maxi{0,1,2}Ta0.i+n1a0.0+n1(a0.2)-+n/3+13, (29)

where a is the bitwise complement of bit string a and n1(a) denotes the Hamming weight of a. Logically, the first term on the right-hand side is a recursive call to sort the thirds, while the second term is the time taken to merge the sorted subsequences on the thirds using a reversal. Each term Ta0.i can be broken down recursively until all subsequences are of length 1. This yields the general formula

T(b)13(r=1log3(n)maxi{0,1,2}r1{n1(a0.i0)+n1(a0.i2¯)}+n/3r+1), (30)

where i indicates the empty string.

Lemma B.2.

Let a𝒰Bn and b𝒰(Bn/2n). Then

E[T(b)]E[T(a)]+O~nα (31)

where α12,1 is a constant.

The intuition behind this lemma is that by the law of large numbers, the deviation of the Hamming weight from n/2 is subleading in n, and the TBS runtime does not change significantly if the input string is altered in a subleading number of places.

Proof.

Consider an arbitrary bit string a, and apply the following transformation. If n1(a)=kn/2, then flip k-n/2 ones chosen uniformly randomly to zero. If k<n/2, flip n/2-k zeros to ones. Call this stochastic function f(a). Then, for all a, f(a)Bn/2n, and for a random string a𝒰Bn, we claim that f(a)𝒰(Bn/2n). In other words, f maps the uniform distribution on Bn to the uniform distribution on Bn/2n.

We show this by calculating the probability Pr(f(a)=b), for arbitrary bBn/2n. A string a can map to b under f only if a and b disagree in the same direction: if, WLOG, n1(a)n1(b), then a must take value 1 wherever a, b disagree (and 0 if n1(a)n1(b)). We denote this property by ab. The probability of picking a uniformly random a such that ab with x disagreements between them is n/2x, since n0(b)=n/2. Next, the probability that f maps a to b is n/2+xx. Combining these, we have

Pr(f(a)=b)=x=n/2n/2Pr(abandn1(a)=n2+x)·Pr(f(a)=babandn1(a)=n2+x), (32)
=x=n/2n/2(n/2|x|)2n·1(n/2+|x||x|), (33)
=1(nn/2)x=n/2n/2(nn/2x)2n, (34)
=1nn/2=1Bn/2n. (35)

Therefore, f(a)𝒰(Bn/2n). Thus, f allows us to simulate the uniform distribution on Bn/2n starting from the uniform distribution on Bn.

Now we bound the runtime of TBS on f(a) in terms of the runtime on a fixed a. Fix some α12,1. We know that n1(f(a))=n/2, and suppose n1(a)-n/2nα. Since f(a) differs from a in at most nα places, then at level r of the TBS recursion (see Eq. (30)), the runtimes of a and f(a) differ by at most 1/3·min2n/3r,nα. This is because the runtimes can differ by at most two times the length of the subsequence. Therefore, the total runtime difference is bounded by

ΔT13r=1log3(n)min{2n3r,nα}, (36)
=13(r=1log3(2n1α)nα+2r=log3(2n1α)+1log3(n)n3r), (37)
=13(nαlog(2nα/3)+2s=0og3(nα/2)13s) (38)
=13nαlog2nα/3+nα/2-1=O~nα. (39)

On the other hand, if n1(a)-n/2nα/2, we simply bound the runtime by that of OES, which is at most n.

Now consider a𝒰Bn and b=f(a)𝒰(Bn/2n). Since n1(a) has the binomial distribution (n,1/2), where (k,p) is the sum of k Bernoulli random variables with success probability p, the Chernoff bound shows that deviation from the mean is exponentially suppressed, i.e.,

Prn1(a)-n/2nα=exp-On2α-1. (40)

Therefore, the deviation in the expectation values is bounded by

ETfa-ETanexp-On2α-1+c1-exp-On2α-1nαlog(n)=O~nα, (41)

where c is a constant. Finally, we conclude that

E[T(b)]E[T(a)]+O~nα (42)

as claimed. □

Next, we prove the main result of this section, namely, that the runtime of GDC(TBS) is 2n/3 up to additive subleading terms.

Proof of Theorem 5.2.

We first prove properties for sorting a random n-bit string a𝒰Bn and then apply this to the problem of sorting b𝒰(Bn/2n) using Lemmas B.1 and B.2.

The expected runtime for TBS can be calculated using the recursive formula in Eq. (30):

E[T(a)]13(r=1log3(n)E[maxi{0,1,2}r1{n1(a0.i0)+n1(a0.i2¯)}]+n/3r+1). (43)

The summand contains an expectation of a maximum over Hamming weights of i.i.d. uniformly random substrings of length n/3r, which is equivalent to a binomial distribution n/3r,1/2 where we have n/3r Bernoulli trials with success probability 1/2. Because of independence, if we sample X1,X2n/3r,1/2, then X1+X22n/3r,1/2. Using Lemma B.3 with m=3r-1, the expected maximum can be bounded by

n3r+On/3rlog3r-1n/3r=n3r+O~n1/2 (44)

since the second term is largest when r=O(1). Therefore,

E[T(a)]13(r=1log3(n)2n3r)+O˜(n1/2)=n3+O˜(n1/2). (45)

Lemma B.2 then gives E[T(b)]n3+O~nα.

The routing algorithm GDC(TBS) proceeds by calling TBS on the full path, and then in parallel on the two disjoint sub-paths of length n/2. We show that the distributions of the left and right halves are uniform if the input permutation is sampled uniformly as π𝒰𝒮n. There exists a bijective mapping g such that g(π)=b,πL,πRBn/2n×𝒮n/2×𝒮n/2 for any π𝒮n since

𝒮n=n!=nn/2n2!n2!=Bn/2n×𝒮n/2×𝒮n/2. (46)

In particular, g can be defined so that b specifies which entries are taken to the first n/2 positions—say, without changing the relative ordering of the entries mapped to the first n/2 positions or the entries mapped to the last n/2 positions—and πL and πR specify the residual permutations on the first and last n/2 positions, respectively. Given g(π)=b,πL,πR, TBS only has access to b. After sorting, TBS can only perform deterministic permutations μL(b), μR(b)𝒮n/2 on the left and right halves, respectively, that depend only on b. Thus TBS performs the mappings πLπLμL(b) and πRπRμR(b) on the output. Now it is easy to see that when πL,πR𝒰𝒮n/2, the output is also uniform because the TBS mapping is independent of the relative permutations on the left and right halves.

More generally, we see that a uniform distribution over permutations 𝒰𝒮n is mapped to two uniform permutations on the left and right half, respectively. Symbolically, for, π𝒰𝒮n, we have that

g(π)=b,πL,πR𝒰(Bn/2n×𝒮n/2×𝒮n/2)=𝒰(Bn/2n)×𝒰𝒮n/2×𝒰𝒮n/2. (47)

As shown earlier, given uniform distributions over left and right permutations, the output is also uniform. By induction, all permutations in the recursive steps are uniform.

We therefore get a sum of expected TBS runtime on bit strings of lengths n/3r, i.e.,

r=1log2nE[T(br)]r=1log2nE[T(ar)]+O˜((n2r1)α)2n3+O˜(nα) (48)

where, by Lemma B.1 and the uniformity of permutations in recursive calls, we need only consider br𝒰(Bn/2r-1n/2r) and we bound the expected runtime using Lemma B.2 with ar𝒰(Bn/2r-1). □

We end with a lemma about the order statistics of binomial random variables used in the proof of the main theorem.

Lemma B.3.

Given m i.i.d. samples from the binomial distribution XiB(n,p) with i[m], and p[0,1], the maximum Y=maxi.Xi satisfies

E[Y]<pn+O(nlog(mn)). (49)

Proof.

We use Hoeffding’s inequality for the Bernoulli random variable X(n,p), which states that

Pr(X(p+ϵ)n)exp-2nϵ2ε0. (50)

Pick ϵ=c2nlog(mn), where c>0 is a constant. For this choice, we have

PrXi(p+ϵ)n1mnc (51)

for every i=1,,m. Then the probability that Y<(p+ϵ)n is identical to the probability that Xi<(p+ϵ)n for every i, which for i.i.d Xi is given by

Pr(Y<(p+ϵ)n)=Pr(X<(p+ϵ)n)m>1-1(mn)cm. (52)

Using Bernoulli’s inequality ((1+x)r1+rx for x-1), we can simplify the above bound to

Pr(Y<(p+ϵ)n)m>1-m1-cn-c. (53)

Finally, we bound the expected value of Y by an explicit weighted sum over its range:

E[Y]=k=0nPr(Y=k)·k (54)
=k=0(p+ϵ)nPr(Y=k)·k+k=(p+ϵ)n+1nPr(Y=k)·k (55)
k=0(p+ϵ)nPr(Y=k))·k+n·k=(p+ϵ)n+1nPr(Y=k) (56)
k=0(p+ϵ)nPr(Y=k)·k+(mn)1c (57)
(p+ϵ)n+(mn)1-c. (58)

Since (mn)1-c<1 for c>1,

E[Y]<pn+1+cn2log(mn)=pn+O(nlog(mn)) (59)

as claimed. □

References

  • [ACG94].Alon N, Chung FRK, and Graham RL “Routing Permutations on Graphs via Matchings”. In: SIAM Journal on Discrete Mathematics 7.3 (1994), pp. 513–530. DOI: 10.1137/s0895480192236628. [DOI] [Google Scholar]
  • [Aru+19].Arute F et al. “Quantum supremacy using a programmable superconducting processor”. In: Nature 574.7779 (2019), pp. 505–510. DOI: 10.1038/s41586-019-1666-5. [DOI] [PubMed] [Google Scholar]
  • [BP93].Bafna V and Pevzner PA “Genome rearrangements and sorting by reversals”. In: Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science. 1993, pp. 148–157. DOI: 10.1137/S0097539793250627. [Google Scholar]
  • [BR17].Banerjee I and Richards D “New Results on Routing via Matchings on Graphs”. In: Fundamentals of Computation Theory. Lecture Notes in Computer Science 10472. Springer, 2017, pp. 69–81. DOI: 10.1007/978-3-662-55751-8_7. [DOI] [Google Scholar]
  • [Bap+20].Bapat A, Schoute E, Gorshkov AV, and Childs AM Nearly optimal time-independent reversal of a spin chain. 2020. arXiv: 2003.02843v1 [quant–ph]. [Google Scholar]
  • [Ben+08].Bender MA, Ge D, He S, Hu H, Pinter RY, Skiena S, and Swidan F “Improved bounds on sorting by length-weighted reversals”. In: Journal of Computer and System Sciences 74.5 (2008), pp. 744–774. DOI: 10.1016/j.jcss.2007.08.008. [DOI] [Google Scholar]
  • [Ben+02].Bennett CH, Cirac JI, Leifer MS, Leung DW, Linden N, Popescu S, and Vidal G “Optimal simulation of two-qubit Hamiltonians using general local operations”. In: Physical Review A 66.1 (2002). DOI: 10.1103/physreva.66.012305. [DOI] [Google Scholar]
  • [Bre+16].Brecht T, Pfaff W, Wang C, Chu Y, Frunzio L, Devoret MH, and Schoelkopf RJ “Multilayer microwave integrated quantum circuits for scalable quantum computing”. In: npj Quantum Information 2.16002 (2016). DOI: 10.1038/npjqi.2016.2. [DOI] [Google Scholar]
  • [CSU19].Childs AM, Schoute E, and Unsal CM “Circuit Transformations for Quantum Architectures”. In: 14th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2019). Vol. 135. Leibniz International Proceedings in Informatics (LIPIcs). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2019, 3:1–3:24. DOI: 10.4230/LIPIcs.TQC.2019.3. [DOI] [Google Scholar]
  • [KS95].Kececioglu J and Sankoff D “Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement”. In: Algorithmica 13.1–2 (1995), pp. 180–210. DOI: 10.1007/BF01188586. [DOI] [Google Scholar]
  • [KSS21].King S, Schoute E, and Shastri H reversal-sort. 2021. URL: https://gitlab.umiacs.umd.edu/amchilds/reversal-sort. [DOI] [PMC free article] [PubMed]
  • [Kja+20].Kjaergaard M, Schwartz ME, Braumüller J, Krantz P, Wang JI-J, Gustavsson S, and Oliver WD “Superconducting qubits: Current state of play”. In: Annual Review of Condensed Matter Physics 11 (2020), pp. 369–395. DOI: 10.1146/annurev-conmatphys-031119-050605. [DOI] [Google Scholar]
  • [Klø08].Kløve T Spheres of Permutations under the Infinity Norm–Permutations with limited displacement. Research rep. 376. Department of Informatics, University of Bergen, Norway, 2008. URL: http://www.ii.uib.no/publikasjoner/texrap/pdf/2008-376.pdf. [Google Scholar]
  • [LDM84].Lakshmivarahan S, Dhall SK, and Miller LL “Parallel Sorting Algorithms”. In: vol. 23. Advances in Computers. Elsevier, 1984, pp. 321–323. DOI: 10.1016/S0065-2458(08)60467-2. [DOI] [Google Scholar]
  • [LWD15].Lye A, Wille R, and Drechsler R “Determining the minimal number of swap gates for multi-dimensional nearest neighbor quantum circuits”. In: The 20th Asia and South Pacific Design Automation Conference. IEEE, 2015, pp. 178–183. DOI: 10.1109/aspdac.2015.7059001. [DOI] [Google Scholar]
  • [MG19].McClure D and Gambetta J Quantum computation center opens. Tech. rep. IBM, 2019. URL: https://www.ibm.com/blogs/research/2019/09/quantum-computation-center/ (visited on 03/30/2020).
  • [MK13].Monroe C and Kim J “Scaling the Ion Trap Quantum Processor”. In: Science 339.6124 (2013), pp. 1164–1169. DOI: 10.1126/science.1231298. [DOI] [PubMed] [Google Scholar]
  • [Mon+14].Monroe C, Raussendorf R, Ruthven A, Brown KR, M0061unz P, Duan L-M, and Kim J “Large-scale modular quantum-computer architecture with atomic memory and photonic interconnects”. In: Physical Review A 89.2 (2014). DOI: 10.1103/physreva.89.022317. [DOI] [Google Scholar]
  • [Mur+19].Murali P, Baker JM, Abhari AJ, Chong FT, and Martonosi M “Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Computers”. In: ASPLOS ‘19. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. The Association for Computing Machinery, 2019, pp. 1015–1029. DOI: 10.1145/3297858.3304075. [Google Scholar]
  • [NNN05].Nguyen TC, Ngo HT, and Nguyen NB “Sorting by Restricted-Length-Weighted Reversals”. In: Genomics, Proteomics & Bioinformatics 3.2 (2005), pp. 120–127. DOI: 10.1016/S1672-0229(05)03016-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [PS16].Pedram M and Shafaei A “Layout Optimization for Quantum Circuits with Linear Nearest Neighbor Architectures”. In: IEEE Circuits and Systems Magazine 16.2 (2016), pp. 62–74. DOI: 10.1109/MCAS.2016.2549950. [DOI] [Google Scholar]
  • [PS02].Pinter R and Skiena S “Genomic sorting with length-weighted reversals”. In: Genome informatics. International Conference on Genome Informatics 13 (2002), pp. 103–11. DOI: 10.11234/gi1990.13.103. [DOI] [PubMed] [Google Scholar]
  • [Rau05].Raussendorf R “Quantum computation via translation-invariant operations on a chain of qubits”. In: Physical Review A 72.5 (2005). DOI: 10.1103/physreva.72.052301. [DOI] [Google Scholar]
  • [Rob55].Robbins H “A remark on Stirling’s formula”. In: The American Mathematical Monthly 62.1 (1955), pp. 26–29. DOI: 10.2307/2315957. [DOI] [Google Scholar]
  • [SWD11].Saeedi M, Wille R, and Drechsler R “Synthesis of quantum circuits for linear nearest neighbor architectures”. In: Quantum Information Processing 10.3 (2011), pp. 355–377. DOI: 10.1007/s11128-010-0201-2. [DOI] [Google Scholar]
  • [SV17].Schwartz M and Vontobel PO “Improved Lower Bounds on the Size of Balls Over Permutations With the Infinity Metric”. In: IEEE Transactions on Information Theory 63.10 (2017), pp. 6227–6239. DOI: 10.1109/TIT.2017.2697423. [DOI] [Google Scholar]
  • [SSP13].Shafaei A, Saeedi M, and Pedram M “Optimization of Quantum Circuits for Interaction Distance in Linear Nearest Neighbor Architectures”. In: Proceedings of the 50th Annual Design Automation Conference. DAC ‘13. ACM, 2013, 41:1–41:6. DOI: 10.1145/2463209.2488785. [Google Scholar]
  • [SSP14].Shafaei A, Saeedi M, and Pedram M “Qubit placement to minimize communication overhead in 2D quantum architectures”. In: 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2014. DOI: 10.1109/aspdac.2014.6742940. [Google Scholar]
  • [TS10].Tamo I and Schwartz M “Correcting Limited-Magnitude Errors in the Rank-Modulation Scheme”. In: IEEE Transactions on Information Theory 56.6 (2010), pp. 2551–2560. DOI: 10.1109/TIT.2010.2046241. [DOI] [Google Scholar]
  • [VHC02].Vidal G, Hammerer K, and Cirac JI “Interaction Cost of Nonlocal Gates”. In: Physical Review Letters 88.23 (2002), p. 237902. DOI: 10.1103/PhysRevLett.88.237902. [DOI] [PubMed] [Google Scholar]
  • [Zha99].Zhang L “Optimal Bounds for Matching Routing on Trees”. In: SIAM Journal on Discrete Mathematics 12.1 (1999), pp. 64–77. DOI: 10.1137/s0895480197323159. [DOI] [Google Scholar]
  • [ZW19].Zulehner A and Wille R “Compiling SU(4) quantum circuits to IBM QX architectures”. In: ASP-DAC ‘19. Proceedings of the 24th Asia and South Pacific Design Automation Conference. ACM Press, 2019, pp. 185–190. DOI: 10.1145/3287624.3287704. [Google Scholar]

RESOURCES