Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 Jun 6;12097:482–492. doi: 10.1007/978-3-030-52200-1_48

Implementing the Tangent Graeffe Root Finding Method

Joris van der Hoeven 13,14, Michael Monagan 13,
Editors: Anna Maria Bigatti8, Jacques Carette9, James H Davenport10, Michael Joswig11, Timo de Wolff12
PMCID: PMC7340920

Abstract

The tangent Graeffe method has been developed for the efficient computation of single roots of polynomials over finite fields with multiplicative groups of smooth order. It is a key ingredient of sparse interpolation using geometric progressions, in the case when blackbox evaluations are comparatively cheap. In this paper, we improve the complexity of the method by a constant factor and we report on a new implementation of the method and a first parallel implementation.

Introduction

Consider a polynomial function Inline graphic over a field Inline graphic given through a black box capable of evaluating f at points in Inline graphic. The problem of sparse interpolation is to recover the representation of Inline graphic in its usual form, as a linear combination

graphic file with name M5.gif 1

of monomials Inline graphic. One popular approach to sparse interpolation is to evaluate f at points in a geometric progression. This approach goes back to work of Prony in the eighteen’s century [15] and became well known after Ben-Or and Tiwari’s seminal paper [2]. It has widely been used in computer algebra, both in theory and in practice; see [16] for a nice survey.

More precisely, if a bound T for the number of terms t is known, then we first evaluate f at Inline graphic pairwise distinct points Inline graphic, where Inline graphic and Inline graphic for all Inline graphic. The generating function of the evaluations at Inline graphic satisfies the identity

graphic file with name M13.gif

where Inline graphic and Inline graphic is of degree Inline graphic. The rational function Inline graphic can be recovered from Inline graphic using fast Padé approximation [4]. For well chosen points Inline graphic, it is often possible to recover the exponents Inline graphic from the values Inline graphic. If the exponents Inline graphic are known, then the coefficients Inline graphic can also be recovered using fast structured linear algebra [5]. This leaves us with the question how to compute the roots Inline graphic of Inline graphic in an efficient way.

For practical applications in computer algebra, we usually have Inline graphic, in which case it is most efficient to use a multi-modular strategy, and reduce to coefficients in a finite field Inline graphic, where p is a prime number that we are free to choose. It is well known that polynomial arithmetic over Inline graphic can be implemented most efficiently using FFTs when the order Inline graphic of the multiplicative group is smooth. In practice, this prompts us to choose p of the form Inline graphic for some small s and such that p fits into a machine word.

The traditional way to compute roots of polynomials over finite fields is using Cantor and Zassenhaus’ method [6]. In [10, 11], alternative algorithms were proposed for our case of interest when Inline graphic is smooth. The fastest algorithm was based on the tangent Graeffe transform and it gains a factor Inline graphic with respect to Cantor–Zassenhaus’ method. The aim of the present paper is to report on a parallel implementation of this new algorithm and on a few improvements that allow for a further constant speed-up.

In Sect. 2, we recall the Graeffe transform and the heuristic root finding method based on the tangent Graeffe transform from [10]. In Sect. 3, we present the main new theoretical improvements, which all rely on optimizations in the FFT-model for fast polynomial arithmetic. Our contributions are twofold. In the FFT-model, one backward transform out of four can be saved for Graeffe transforms of order two (see Sect. 3.2). When composing a large number of Graeffe transforms of order two, FFT caching can be used to gain another factor of 3/2 (see Sect. 3.3). In the longer preprint version of the paper [12], we also show how to generalize our methods to Graeffe transforms of general orders and how to use it in combination with the truncated Fourier transform.

Section 4 is devoted to our new sequential and parallel implementations of the algorithm in C and Cilk C. Our sequential implementation confirms the gain of a new factor of two when using the new optimizations. So far, we have achieved a parallel speed-up by a factor of 4.6 on an 8-core machine. Our implementation is freely available at http://www.cecm.sfu.ca/CAG/code/TangentGraeffe.

Root Finding Using the Tangent Graeffe Transform

Graeffe Transforms

The traditional Graeffe transform of a monic polynomial Inline graphic of degree d is the unique monic polynomial Inline graphic of degree d such that

graphic file with name M35.gif 2

If P splits over Inline graphic into linear factors Inline graphic, then one has

graphic file with name M38.gif

More generally, given Inline graphic, we define the Graeffe transform of order r to be the unique monic polynomial Inline graphic of degree d such that Inline graphic. If Inline graphic, then

graphic file with name M43.gif

If Inline graphic, then we have

graphic file with name M45.gif 3

Root Finding Using Tangent Graeffe Transforms

Let Inline graphic be a formal indeterminate with Inline graphic. Elements in Inline graphic are called tangent numbers. Now let Inline graphic be of the form Inline graphic where Inline graphic are pairwise distinct. Then the tangent deformation Inline graphic satisfies

graphic file with name M53.gif

The definitions from the previous subsection readily extend to coefficients in Inline graphic instead of Inline graphic. Given Inline graphic, we call Inline graphic the tangent Graeffe transform of P of order r. We have

graphic file with name M58.gif

where

graphic file with name M59.gif

Now assume that we have an efficient way to determine the roots Inline graphic of Inline graphic. For some polynomial Inline graphic, we may decompose Inline graphic For any root Inline graphic of Inline graphic, we then have

graphic file with name M66.gif

Whenever Inline graphic happens to be a single root of Inline graphic, it follows that

graphic file with name M69.gif

If Inline graphic, this finally allows us to recover Inline graphic as Inline graphic.

Heuristic Root Finding over Smooth Finite Fields

Assume now that Inline graphic is a finite field, where p is a prime number of the form Inline graphic for some small Inline graphic. Assume also that Inline graphic be a primitive element of order Inline graphic for the multiplicative group of Inline graphic.

Let Inline graphic be as in the previous subsection. The tangent Graeffe method can be used to efficiently compute those Inline graphic of P for which Inline graphic is a single root of Inline graphic. In order to guarantee that there are a sufficient number of such roots, we first replace P(z) by Inline graphic for a random shift Inline graphic, and use the following heuristic:

  • H For any subset Inline graphic of cardinality d and any Inline graphic, there exist at least p/2 elements Inline graphic such that Inline graphic contains at least 2d/3 elements.

For a random shift Inline graphic and any Inline graphic, the assumption ensures with probability at least 1/2 that Inline graphic has at least d/3 single roots.

Now take r to be the largest power of two such that Inline graphic and let Inline graphic. By construction, note that Inline graphic. The roots Inline graphic of Inline graphic are all s-th roots of unity in the set Inline graphic. We may thus determine them by evaluating Inline graphic at Inline graphic for Inline graphic. Since Inline graphic, this can be done efficiently using a discrete Fourier transform. Combined with the tangent Graeffe method from the previous subsection, this leads to the following probabilistic algorithm for root finding:graphic file with name 495991_1_En_48_Figb_HTML.jpg

Remark 1

To compute Inline graphic we may use Inline graphic, which requires three polynomial multiplications in Inline graphic of degree d. In total, step 5 thus performs Inline graphic such multiplications. We discuss how to perform step 5 efficiently in the FFT model in Sect. 3.

Remark 2

For practical implementations, one may vary the threshold Inline graphic for r and the resulting threshold Inline graphic for s. For larger values of s, the computations of the DFTs in step 6 get more expensive, but the proportion of single roots goes up, so more roots are determined at each iteration. From an asymptotic complexity perspective, it would be best to take Inline graphic. In practice, we actually preferred to take the lower threshold Inline graphic, because the constant factor of our implementation of step 6 (based on Bluestein’s algorithm [3]) is significant with respect to our highly optimized implementation of the tangent Graeffe method. A second reason we prefer s of size O(d) instead of Inline graphic is that the total space used by the algorithm is linear in s. In the future, it would be interesting to further speed up step 6 by investing more time in the implementation of high performance DFTs of general orders s.

Computing Graeffe Transforms

Reminders About Discrete Fourier Transforms

Assume Inline graphic is invertible in Inline graphic and let Inline graphic be a primitive n-th root of unity. Consider a polynomial Inline graphic. Then the discrete Fourier transform (DFT) of order n of the sequence Inline graphic is defined by

graphic file with name M116.gif

We will write Inline graphic for the cost of one discrete Fourier transform in terms of the number of operations in Inline graphic and assume that Inline graphic. For any Inline graphic, we have

graphic file with name M121.gif 4

If n is invertible in Inline graphic, then it follows that Inline graphic. The costs of direct and inverse transforms therefore coincide up to a factor O(n).

If Inline graphic is composite, Inline graphic, and Inline graphic, then it is well known [7] that

graphic file with name M127.gif 5

This means that a DFT of length n reduces to Inline graphic transforms of length Inline graphic plus Inline graphic transforms of length Inline graphic plus n multiplications in Inline graphic:

graphic file with name M133.gif

In particular, if Inline graphic, then Inline graphic.

It is sometimes convenient to apply DFTs directly to polynomials as well; for this reason, we also define Inline graphic. Given two polynomials Inline graphic with Inline graphic, we may then compute the product AB using

graphic file with name M139.gif

In particular, if Inline graphic denotes the cost of multiplying two polynomials of degree Inline graphic, then we obtain Inline graphic.

Remark 3

In Algorithm 1, we note that step 6 comes down to the computation of three DFTs of length s. Since r is a power of two, this length is of the form Inline graphic for some Inline graphic. In view of (5), we may therefore reduce step 6 to Inline graphic DFTs of length Inline graphic plus Inline graphic DFTs of length Inline graphic. If Inline graphic is very small, then we may use a naive implementation for DFTs of length Inline graphic. In general, one may use Bluestein’s algorithm [3] to reduce the computation of a DFT of length Inline graphic into the computation of a product in Inline graphic, which can in turn be computed using FFT-multiplication and three DFTs of length a larger power of two.

Graeffe Transforms of Order Two

Let Inline graphic be a field with a primitive (2n)-th root of unity Inline graphic. Let Inline graphic be a polynomial of degree Inline graphic. Then the relation (2) yields

graphic file with name M157.gif 6

For any Inline graphic, we further note that

graphic file with name M159.gif 7

so Inline graphic can be obtained from Inline graphic using n transpositions of elements in Inline graphic. Concerning the inverse transform, we also note that

graphic file with name M163.gif

for Inline graphic. Plugging this into (6), we conclude that

graphic file with name M165.gif

This leads to the following algorithm for the computation of G(P):graphic file with name 495991_1_En_48_Figc_HTML.jpg

Proposition 1

Let Inline graphic be a primitive 2n-th root of unity in Inline graphic and assume that 2 is invertible in Inline graphic. Given a monic polynomial Inline graphic with Inline graphic, we can compute G(P) in time Inline graphic.

Proof

We have already explained the correctness of Algorithm 2. Step 1 requires one forward DFT of length 2n and cost Inline graphic. Step 2 can be done in O(n). Step 3 requires one inverse DFT of length n and cost Inline graphic. The total cost of Algorithm 2 is therefore Inline graphic.

Remark 4

In terms of the complexity of multiplication, we obtain Inline graphic. This gives a Inline graphic improvement over the previously best known bound Inline graphic that was used in [10]. Note that the best known algorithm for squaring polynomials of degree Inline graphic is Inline graphic. It would be interesting to know whether squares can also be computed in time Inline graphic.

Graeffe Transforms of Power of Two Orders

In view of (3), Graeffe transforms of power of two orders Inline graphic can be computed using

graphic file with name M182.gif 8

Now assume that we computed the first Graeffe transform G(P) using Algorithm 2 and that we wish to apply a second Graeffe transform to the result. Then we note that

graphic file with name M183.gif 9

is already known for Inline graphic. We can use this to accelerate step 1 of the second application of Algorithm 2. Indeed, in view of (5) for Inline graphic and Inline graphic, we have

graphic file with name M187.gif 10

for Inline graphic. In order to exploit this idea in a recursive fashion, it is useful to modify Algorithm 2 so as to include Inline graphic in the input and Inline graphic in the output. This leads to the following algorithm:graphic file with name 495991_1_En_48_Figd_HTML.jpg

Proposition 2

Let Inline graphic be a primitive 2n-th root of unity in Inline graphic and assume that 2 is invertible in Inline graphic. Given a monic polynomial Inline graphic with Inline graphic and Inline graphic, we can compute Inline graphic in time Inline graphic.

Proof

It suffices to compute Inline graphic and then to apply Algorithm 3 recursively, m times. Every application of Algorithm 3 now takes Inline graphic operations in Inline graphic, whence the claimed complexity bound.

Remark 5

In [10], Graeffe transforms of order Inline graphic were directly computed using the formula (8), using Inline graphic operations in Inline graphic, which is twice as slow as the new algorithm.

Implementation and Benchmarks

We have implemented the tangent Graeffe root finding algorithm (Algorithm 1) in C with the optimizations presented in Sect. 3. Our C implementation supports primes of size up to 63 bits. In what follows all complexities count arithmetic operations in Inline graphic.

In Tables 1 and 2 the input polynomial P(z) of degree d is constructed by choosing d distinct values Inline graphic for Inline graphic at random and creating Inline graphic. We will use Inline graphic, a smooth 63 bit prime. For this prime Inline graphic is Inline graphic.

Table 1.

Sequential timings in CPU seconds for Inline graphic and using Inline graphic.

d Our sequential TG implementation in C Magma CZ timings
Total First %roots Step 5 Step 6 Step 9 V2.25-3 V2.25-5
Inline graphic 0.11 s 0.07 s 69.8% 0.04 s 0.02 s 0.01 s 23.22 s 8.43
Inline graphic 0.22 s 0.14 s 69.8% 0.09 s 0.03 s 0.01 s 56.58 s 18.94
Inline graphic 0.48 s 0.31 s 68.8% 0.18 s 0.07 s 0.02 s 140.76 s 44.07
Inline graphic 1.00 s 0.64 s 69.2% 0.38 s 0.16 s 0.04 s 372.22 s 103.5
Inline graphic 2.11 s 1.36 s 68.9% 0.78 s 0.35 s 0.10 s 1494.0 s 234.2
Inline graphic 4.40 s 2.85 s 69.2% 1.62 s 0.74 s 0.23 s 6108.8 s 534.5
Inline graphic 9.16 s 5.91 s 69.2% 3.33 s 1.53 s 0.51 s NA 1219
Inline graphic 19.2 s 12.4 s 69.2% 6.86 s 3.25 s 1.13 s NA 2809

Table 2.

Real times in seconds for 1 core (8 cores) and Inline graphic.

d Our parallel tangent Graeffe implementation in Cilk C
Total First Step 5 Step 6 Step 9
Inline graphic 18.30 s(9.616 s) 11.98 s(2.938 s) 6.64 s(1.56 s) 3.13 s(0.49 s) 1.09 s(0.29 s)
Inline graphic 38.69 s(12.40 s) 25.02 s(5.638 s) 13.7 s(3.03 s) 6.62 s(1.04 s) 2.40 s(0.36 s)
Inline graphic 79.63 s(20.16 s) 52.00 s(11.52 s) 28.1 s(5.99 s) 13.9 s(2.15 s) 5.32 s(0.85 s)
Inline graphic 166.9 s(41.62 s) 107.8 s(23.25 s) 57.6 s(11.8 s) 28.9 s(4.57 s) 11.7 s(1.71 s)
Inline graphic 346.0 s(76.64 s) 223.4 s(46.94 s) 117 s(23.2 s) 60.3 s(9.45 s) 25.6 s(3.54 s)
Inline graphic 712.7 s(155.0 s) 459.8 s(95.93 s) 238 s(46.7 s) 125 s(19.17) 55.8 s(7.88 s)
Inline graphic 1465 s(307.7 s) 945.0 s(194.6 s) 481 s(92.9 s) 259 s(39.2 s) 121 s(16.9 s)

One goal we have is to determine how much faster the Tangent Graeffe (TG) root finding algorithm is in practice when compared with the Cantor-Zassenhaus (CZ) algorithm which is implemented in many computer algebra systems. In Table 1 we present timings comparing our sequential implementation of the TG algorithm with Magma’s implementation of the CZ algorithm. For polynomials in Inline graphic, Magma uses Shoup’s factorization algorithm from [17]. For our input P(z), with d distinct linear factors, Shoup uses the Cantor–Zassenhaus equal degree factorization method. The average complexity of TG is Inline graphic and of CZ is Inline graphic.

The timings in Table 1 are sequential timings obtained on a Linux server with an Intel Xeon E5-2660 CPU with 8 cores. In Table 1 the time in column “first” is for the first application of the TG algorithm (steps 1–9 of Algorithm 1), which obtains about 69% of the roots. The time in column “total” is the total time for the TG algorithm. Columns step 5, step 6, and step 9 report the time spent in steps 5, 6, and 9 in Algorithm 1 and do not count time in the recursive call in step 10.

The Magma timings are for Magma’s +Factorization+ command. The timings for Magma version V2.25-3 suggest that Magma’s CZ implementation involves a subalgorithm with quadratic asymptotic complexity. Indeed it turns out that the author of the code implemented all of the sub-quadratic polynomial arithmetic correctly, as demonstrated by the second set of timings for Magma in column V2.25-5, but inserted the d linear factors found into a list using linear insertion! Allan Steel of the Magma group identified and fixed the offending subroutine for Magma version V2.25-5. The timings show that TG is faster than CZ by a factor of 76.6 (=8.43/0.11) to 146.3 (=2809/19.2).

We also wanted to attempt a parallel implementation. To do this we used the MIT Cilk C compiler from [8]. Cilk provides a simple fork-join model of parallelism. Unlike the CZ algorithm, TG has no gcd computations that are hard to parallelize. We present some initial parallel timing data in Table 2. The timings in parentheses are parallel timings for 8 cores.

Implementation Notes

To implement the Taylor shift Inline graphic in step 3, we used the Inline graphic method from [1, Lemma 3]. For step 5 we use Algorithm 3. It has complexity Inline graphic. To evaluate Inline graphic and B(z) in step 6 in Inline graphic we used the Bluestein transformation [3]. In step 9 to compute the product Inline graphic, for Inline graphic roots, we used the Inline graphic product tree multiplication algorithm [9]. The division in step 10 is done in Inline graphic with the fast division.

The sequential timings in Tables 1 and 2 show that steps 5, 6 and 9 account for about 90% of the total time. We parallelized these three steps as follows. For step 5, the two forward and two inverse FFTs are done in parallel. We also parallelized our radix 2 FFT by parallelizing recursive calls for size Inline graphic and the main loop in blocks of size Inline graphic as done in [14]. For step 6 there are three applications of Bluestein to compute Inline graphic, Inline graphic and Inline graphic. We parallelized these (thereby doubling the overall space used by our implementation). The main computation in the Bluestein transformation is a polynomial multiplication of two polynomials of degree s. The two forward FFTs are done in parallel and the FFTs themselves are parallelized as for step 5. For the product in step 9 we parallelize the two recursive calls in the tree multiplication for large sizes and again, the FFTs are parallelized as for step 5.

To improve parallel speedup we also parallelized the polynomial multiplication in step 3 and the computation of the roots in step 8. Although step 8 is O(|S|), it is relatively expensive because of two inverse computations in Inline graphic. Because we have not parallelized about 5% of the computation the maximum parallel speedup we can obtain is a factor of Inline graphic. The best overall parallel speedup we obtained is a factor of 4.6 = 1465/307.7 for Inline graphic.

Footnotes

Note: This paper received funding from NSERC (Canada) and “Agence de l’innovation de défense” (France).

Note: This document has been written using GNU Inline graphic [13].

Contributor Information

Anna Maria Bigatti, Email: bigatti@dima.unige.it.

Jacques Carette, Email: carette@mcmaster.ca.

James H. Davenport, Email: j.h.davenport@bath.ac.uk

Michael Joswig, Email: joswig@math.tu-berlin.de.

Timo de Wolff, Email: t.de-wolff@tu-braunschweig.de.

Michael Monagan, Email: mmonagan@sfu.ca.

References

  • 1.Aho AV, Steiglitz K, Ullman JD. Evaluating polynomials on a fixed set of points. SIAM J. Comput. 1975;4:533–539. doi: 10.1137/0204045. [DOI] [Google Scholar]
  • 2.Ben-Or, M., Tiwari, P.: A deterministic algorithm for sparse multivariate polynomial interpolation. In: STOC 1988: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pp. 301–309. ACM Press (1988)
  • 3.Bluestein LI. A linear filtering approach to the computation of discrete Fourier transform. IEEE Trans. Audio Electroacoust. 1970;18(4):451–455. doi: 10.1109/TAU.1970.1162132. [DOI] [Google Scholar]
  • 4.Brent RP, Gustavson FG, Yun DYY. Fast solution of Toeplitz systems of equations and computation of Padé approximants. J. Algorithms. 1980;1(3):259–295. doi: 10.1016/0196-6774(80)90013-9. [DOI] [Google Scholar]
  • 5.Canny, J., Kaltofen, E., Lakshman, Y.: Solving systems of non-linear polynomial equations faster. In: Proceedings of the ACM-SIGSAM 1989 International Symposium on Symbolic and Algebraic Computation, pp. 121–128. ACM Press (1989)
  • 6.Cantor DG, Zassenhaus H. A new algorithm for factoring polynomials over finite fields. Math. Comput. 1981;36(154):587–592. doi: 10.1090/S0025-5718-1981-0606517-5. [DOI] [Google Scholar]
  • 7.Cooley JW, Tukey JW. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965;19:297–301. doi: 10.1090/S0025-5718-1965-0178586-1. [DOI] [Google Scholar]
  • 8.Frigo, M., Leisorson, C.E., Randall, R.K.: The implementation of the Cilk-5 multithreaded language. In: Proceedings of PLDI 1998, pp. 212–223. ACM (1998)
  • 9.von zur Gathen, J., Gerhard, J.: Modern Computer Algebra, 3rd edn. Cambridge University Press, New York (2013)
  • 10.Grenet, B., van der Hoeven, J., Lecerf, G.: Randomized root finding over finite fields using tangent Graeffe transforms. In: Proceedings of the ISSAC 2015, pp. 197–204. ACM, New York (2015)
  • 11.Grenet B, van der Hoeven J, Lecerf G. Deterministic root finding over finite fields using Graeffe transforms. Appl. Algebra Eng. Commun. Comput. 2015;27(3):237–257. doi: 10.1007/s00200-015-0280-5. [DOI] [Google Scholar]
  • 12.van der Hoeven, J., Monagan, M.: Implementing the tangent Graeffe root finding method. Technical report, HAL (2020). http://hal.archives-ouvertes.fr/hal-02525408
  • 13.van der Hoeven, J., et al.: GNU TeXmacs (1998). http://www.texmacs.org
  • 14.Law, M., Monagan, M.: A parallel implementation for polynomial multiplication modulo a prime. In: Proceedings of PASCO 2015, pp. 78–86. ACM (2015)
  • 15.Prony, R.: Essai expérimental et analytique sur les lois de la dilatabilité des fluides élastiques et sur celles de la force expansive de la vapeur de l’eau et de la vapeur de l’alkool, à différentes températures. J. de l’École Polytechnique Floréal et Plairial, an III 1(cahier 22), 24–76 (1795)
  • 16.Roche, D.S.: What can (and can’t) we do with sparse polynomials? In: Arreche, C. (ed.) ISSAC 2018: Proceedings of the 2018 ACM International Symposium on Symbolic and Algebraic Computation, pp. 25–30. ACM Press (2018)
  • 17.Shoup V. A new polynomial factorization and its implementation. J. Symb. Comput. 1995;20(4):363–397. doi: 10.1006/jsco.1995.1055. [DOI] [Google Scholar]

Articles from Mathematical Software – ICMS 2020 are provided here courtesy of Nature Publishing Group

RESOURCES