Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1997 Feb 18;94(4):1054–1058. doi: 10.1073/pnas.94.4.1054

Using a parity-sensitive sieve to count prime values of a polynomial

John Friedlander *,, Henryk Iwaniec
PMCID: PMC19742  PMID: 11038598

Abstract

It is expected that any irreducible polynomial with integer coefficients assumes infinitely many prime values provided that it satisfies some obvious local conditions. Moreover, it is expected that the frequency of these primes obeys a simple asymptotic law. This has however been proven for only a few special classes of polynomials. In the most famous unsolved cases the sequence of values is “thin” in the sense that it contains fewer than Nθ integers up to N for some constant θ < 1. Quite generally it seems to be difficult to show the infinitude of primes in a given thin integer sequence and there is no polynomial for which this has hitherto been done. The polynomial x2 + y4 is an example of such a thin sequence; here, specifically, θ = 3/4. We report here the development of new methods that rigorously demonstrate the asymptotic formula in the case of this polynomial and that are applicable to an infinite class of polynomials to which this one belongs. The proof is based partly on a new sieve method that breaks the well-known parity problem of sieve theory and partly on a careful harmonic analysis of the special properties of biquadratic polynomial sequences.


The prime numbers that can be written in the form a2 + b2 were characterized more than 300 years ago by Fermat. It is quite easy to see that no prime of the form 4m + 3 can be a sum of two squares, and Fermat proved the much more difficult result that every prime of the form 4m + 1 can be so written. In the eighteenth and nineteenth centuries, mainly due to the efforts of Lagrange and Gauss, this result was found to be a special case of a more general phenomenon: Given any irreducible binary quadratic form ϕ(a, b) = αa2 + βab + γb2, the primes represented by ϕ are characterized by congruence and class group conditions. This fact made it possible, following the nineteenth century breakthroughs on prime counting by Dirichlet, Hadamard, and Vallée-Poussin, to give asymptotic formulae for the number of primes up to x, which are represented by such a form. Apart from a minority of ϕ that fail to satisfy some local condition and hence cannot represent more than one prime, one finds that a positive density of all primes are represented by such a form.

For more general polynomials one cannot expect such a simple characterization; nevertheless, one may quite generally formulate expected asymptotic formulae for the frequency of primes represented. Success in proving these has however been limited; even proving the representability of infinitely many primes seems no easier. In the case of two variables the result is known for general quadratic polynomials, as given by Iwaniec (1). For polynomials in one variable the problem is harder still and only the case of linear polynomials, that is arithmetic progressions, is settled thanks to Dirichlet.

Here we describe new methods that prove that there are infinitely many primes of the form a2 + b4 and give the asymptotic formula, the first such results for any thin polynomial sequence.

Theorem 1. We have, with Λ the von Mangoldt function,

graphic file with name M1.gif

where κ = Inline graphic(1 − t4)1/2dt.

We remark that by comparing this with the well-known asymptotic formula for the case of a2 + b2 (change x3/4 to x and t4 to t2), we see that the “probability” of an integer a2 + b2 being prime is the same when we are told that b is a square as it is when we are told that b is not a square.

The proof of Theorem 1 is based on a sieve method. In its classical format the sieve is unable to detect primes for a very basic reason known as the parity problem. In the case where the sequence is close to positive density and has extremely good regularity properties the sieve very barely fails, whereas in the case of a thin sequence like the one we consider the failure is by a wider margin. The former case was analyzed very precisely by the asymptotic sieve of Bombieri (2). By adding an additional axiom to those already present, we are able to modify Bombieri’s sieve to the point that we can produce asymptotic formulae for primes even in thin sequences.

Having this sieve at hand, there remains the still more difficult problem of proving that the sequence of integers a2 + b4 satisfies the additional axiom. This occupies the greater part of the work and incidentally gives rise to a number of results of independent interest.

The Sieve

Given a general sequence 𝒜 = (an) of nonnegative reals, we should like to evaluate asymptotically the sum

graphic file with name M3.gif 1

In the case of our application we have an being the number of representations of n = a2 + b4. Thus, we want to allow the sequence 𝒜 to be quite thin. Setting

graphic file with name M4.gif 2

we make the very mild assumption

graphic file with name M5.gif 3

Inserting in Eq. 1 the formula Λ(n) = Inline graphicλd we obtain, after an interchange of order,

graphic file with name M7.gif 4

where λd = −μ(d)log d. This reduces the problem to a sufficiently precise evaluation of the sums

graphic file with name M8.gif 5

We assume that, for some nice function g, Ad(x) is closely approximated by g(d)A(x), say

graphic file with name M9.gif 6

where rd(x) is small. This last vague statement must be made more precise in several respects. There are a number of standard ways to do this and we do not attempt to choose a minimal set of axioms. We assume that, uniformly in dx1/3 we have

graphic file with name M10.gif 7

The function g is assumed to have a number of properties [think of g(d) = d−1 as the prototypical example]. We assume g is multiplicative with 0 ≤ g(p) < 1 for all primes p, that

graphic file with name M11.gif 8

for some constant B, where τ is the divisor function, that for y ≥ 2,

graphic file with name M12.gif 9

for some constants c1 > 0 and c0, and finally that, for y ≥ 2,

graphic file with name M13.gif 10

All of the above axioms are easy to verify for a great many sequences Α to which the sieve has been applied and, in particular, to the sequence we consider here. In our case one has g(p) = 1/p + (χ(p)/p)(1 − 1/p), where χ is the Dirichlet character of conductor 4.

About the remainder terms rd we assume

graphic file with name M14.gif R

for all tx, with some D = D(x) such that x2/3 < D(x) < x. This assumption, or something very much like it, is the final and pivotal assumption of classical sieve theory. In contrast to the earlier assumptions, this one is much more important and usually more difficult to verify. In the case of our application the assumption [R] was recently proven by Fouvry and Iwaniec (3) with D = x3/4−ɛ, for every ɛ > 0. It was this surprising and best-possible result that gave us the courage to attempt this project.

The additional assumption that gives the asymptotic for primes is an estimate for bilinear forms. We assume

graphic file with name M15.gif B

for every N with

graphic file with name M16.gif

for some δ = δ(x) ≥ 2 and Δ = Δ(x) ≥ 2, and where

graphic file with name M17.gif

This is required for every C with 1 ≤ CxD−1.

 Theorem 2. If the an are supported on squarefree integers, then under the above assumptions

graphic file with name M18.gif

where H = Πp(1 − g(p))(1 − 1/p)−1and the implied constant depends only on the function g.

We remark that by modifying slightly the assumptions we can remove the requirement that an be supported on squarefree integers.

For the proof of Theorem 2 we decompose the function Λ by means of a combinatorial identity. Such identities have been used before by many people including Vinogradov, Linnik, Vaughan, and Heath-Brown. In our case, we choose a parameter z and can write, for n > z, Λ(n) essentially as a sum of several sub-sums, the most difficult and important of which are of the type Σde|nμ(d)Λ(e), where d and e are either both small or both large. Inserting this into the formula (Eq. 1), we get a decomposition of that sum. It is not hard to see that the pieces where d and e are small can be handled by [R], roughly when their product is less than D, and that these give the main term in Theorem 2. If one assumed [R] held to a sufficiently high level D this would take care of matters since the remaining case of d, e both large would not happen often, or at all, since de | n and nx. This would however require an unrealistically strong assumption and the resulting theorem would be without application. Any more realistic assumption about D necessitates a more careful treatment and even then still leaves uncovered the contributions where one of d, e come from a mid-sized range. These however can be estimated by the bilinear form occurring in the assumption [B]. The presence of δ and Δ in this assumption is necessary to avoid making it unreasonably strict.

There are a number of technical details we have not yet mentioned. The most essential is the following: In dealing with the contribution where d, e are both large it is difficult to keep control of a certain cancellation coming from the Möbius function and this necessitates attaching sieve weights ϱn to an right from the beginning. This device was used by Bombieri (2) serve the analogous function of allowing one to assume [R] only up to a realistic level.

The Bilinear Form

The various assumptions of the previous section, other than [R] and [B], are easily verified for our sequence and [R] was demonstrated with D = x3/4−ɛ in ref. 3. Our remaining task is to prove that [B] holds for all N with

graphic file with name M19.gif

This is actually stronger than needed for Theorem 1. It would suffice in this lower bound to cover the range N > xα for some α < 3/8.

The double sum occurring in [B] is easily transformed into sums of the form

graphic file with name M20.gif

where βn = γnμ(n), αm replaces the absolute value, and where it is convenient for technical reasons to break up the ranges for m, n into subintervals and restrict to integers free from small prime factors. To attack the above double sum we have little choice but to rid ourselves of the set of coefficients αm by Cauchy’s inequality but to do so at once would be too costly. It is more natural and also more efficient to first write this in terms of Gaussian integers obtaining by the unique factorization in that ring

graphic file with name M21.gif

where w, z run through ℤ[i], the factor 1/4 takes account of the units and q is the characteristic function of the squares. Now applying Cauchy’s inequality, introducing a smooth majorant f(w), and squaring out, we are led to the problem of bounding the sum

graphic file with name M22.gif 11

with

graphic file with name M23.gif

where βz = β|z|2 and the condition (z1, z2) = 1 can be introduced with acceptable error because β lives on integers without small prime factors. We may also, by a splitting argument, restrict z1, z2 to a small (polar coordinate) box.

The Error Term

We give in this section a type of Fourier expansion for the sum of Eq. 11 and then bound all but one term in this expansion. The remaining “main” term is arithmetic in nature and does reduce in size due to the oscillation of sign in the sequence β. The treatment of this main term, which is much more difficult, we defer to future sections.

The problem of obtaining an asymptotic formula for 𝒞(z1, z2) reduces to the counting of lattice points inside the “biquadratic ellipse”

graphic file with name M24.gif

for fixed γ, 0 < γ < 1. We put

graphic file with name M25.gif

and Re w̄z1 = Inline graphic, Re w̄z2 = Inline graphic. These determine w by the equation iΔw = Inline graphicz2Inline graphicz1. As w ranges over ℤ[i], this equation is equivalent to a congruence modulo |Δ| and we have

graphic file with name M30.gif 12

Since |Δ| may be very large there are few lattice points relative to the volume and the problem may be expected to be difficult.

An application of the Poisson summation formula transforms Eq. 12 into

graphic file with name M31.gif

where G(h1, h2) is the sum

graphic file with name M32.gif 13

and F(u1, u2) is the integral

graphic file with name M33.gif 14

The main term comes from h1 = h2 = 0. The integral F turns out not to be a difficult problem. In the case of F(0, 0) it may be explicitly evaluated and for the other (error) terms it suffices to input a bound that is derived by repeated partial integration, together with a crude bound for G in terms of the divisor function τ(Δ); however, the fact that the divisor function is occasionally fairly large forces us to bound this error only on average over z1, z2. Thus, we don’t quite solve the lattice point problem, but solving it in this average sense suffices for our main concern. The final result is that the sum of Eq. 11 is, apart from an admissible error, given by the main term

graphic file with name M34.gif

where

graphic file with name M35.gif

Here we have used the fact that with z1, z2 confined inside a small box we can treat Δ as though it may be replaced by a constant Δ0.

Reduction of the Main Term

Our next task is to simplify and transform the sum T(β) in the main term. This requires us to penetrate the arithmetic nature of the sum G(0, 0) from Eq. 13. The latter is closely related to ϱ(z2/z1; Δ), where ϱ(z; Δ) counts the number of solutions in rational residue classes of ω2z(mod |Δ|) (actually we need to also consider ϱ to moduli dividing Δ to take care of a technical problem involving coprimality). The number ϱ is, as is well known, expressible as a sum of the Jacobi symbol over the divisors d of Δ. Following Dirichlet we separate these divisors into two sets d < Inline graphic and d ≥ Inline graphic, and transform the latter into their complementary divisors d → |Δ|/d. The first set gives rise to sums of the shape

graphic file with name M38.gif

whereas, after some manipulation, the latter give sums

graphic file with name M39.gif

Here, for j = 1, 2, we write zj = rj + isj, r odd, s even, and in case of even integers d, the 2 part is ignored in the Jacobi symbol. In reducing to these sums (and later in estimating them) all of the basic properties of the Jacobi symbol, multiplicativity, periodicity, and quadratic reciprocity, are used in an essential fashion.

We split into segments D < d ≤ 2D, detect the condition d |Δ by additive characters, interchange the order of summation, and apply Cauchy’s inequality. This leads to the problem of giving nontrivial bounds for various D for the sum

graphic file with name M40.gif 15

and also for the sum

graphic file with name M41.gif 16

where z = r + is with s even, r odd, R < r ≤ 2R, S < s ≤ 2S.

Twisted Sums Over Arithmetic Progressions

In case the range D for the modulus is neither too small nor too large we are able to obtain results where the coefficients β are replaced by arbitrary complex numbers αrs. This means in particular that, for such D, the sums SI, SII may be treated simultaneously, since the Jacobi symbol (s/r) may be incorporated into βz and here we name the resulting sum S(D).

In the event that the αrs factorized and the symbol (r/d) were absent this would be a result of Barban type and would follow from the large sieve. Now however new ideas are required.

We proceed in three steps. In the first place, using duality, Poisson summation, and a number of elementary but nontrivial arguments one obtains a bound which covers the range (log RS)A < D < (RS)1/2−ɛ. At the second stage one replaces, in the sum S(D), the congruence r1s2s1r2(mod d) by the equation r1s2r2s1 = dm and then makes the change of variable dm. If the modulus d of the old congruence is in the range (RS)1/2+ɛ < d < RS(log RS)−A, then the new modulus m is in the range treated in the first step. This idea was first used in the context of Barban–Davenport–Halberstam Theorems by Hooley but of course appears already in Dirichlet’s work on the divisor problem.

The first two steps leave uncovered a middle range (RS)1/2−ɛ < D < (RS)1/2+ɛ. Because (r/dp2) = (r/d) for all primes p not dividing r, it turns out to be possible for any P ≥ 2 to bound S(D) in terms of S(DP2) by averaging over the primes in the dyadic interval P < p ≤ 2P. Now, if D is in the middle range, we may, by choosing P appropriately, place DP2 in the range of larger moduli covered by the result of step two.

Combining these results we obtain for each of SI and SII a bound, which, as opposed to the trivial bound, saves an amount (log RS)A, A arbitrary, throughout the whole range

graphic file with name M42.gif

and saves quite a bit more when D is not too close to the boundary of this region. Of course, because of the Dirichlet involution and the fact that we treat general coefficients, we do not really need to cover the range D > (RS)1/2+ɛ. This would not however allow us to skip any of the three steps in the argument.

Siegel–Walfisz

We still need to treat the small moduli where D < (log RS)B, B arbitrary. Now the two cases SI, SII need very different treatments and in both cases the special shape of the coefficient β is heavily used. In the case of SI the cancellation comes from the sign changes of the Möbius function, which is embedded into β and we are able to succeed, even for each individual d, by giving a Siegel–Walfisz Theorem in the Gaussian domain. The fundamentals of such a result are due to Hecke and to Siegel, although we need to do some work to get the result in a form that is applicable to the problem at hand. Grossencharacters enter naturally at this point and as a result of taking them into account our final results on primes a2 + b4 demonstrate also their uniform distribution in sectors in the Gaussian plane.

Jacobi–Twisted Sums

Our final chore, and rather an interesting one, is the treatment of the small moduli D < (log RS)B in the case of sums SII. Recall that these were originally the large moduli before application of the Dirichlet involution. As in the previous case, the special shape of the coefficients is crucial, although now our cancellation will come not from the Möbius function but instead from the oscillation in sign coming from the Jacobi symbol.

We say the Gaussian integer z = r + is is primary if s is even and r is congruent to s + 1 modulo four. The product z = z1z2 of primary numbers is primary. The symbol [z] defined by [z] = i(r − 1)/2 (s/|r|) for primary z enjoys the multiplicativity property

graphic file with name M43.gif 17

where ɛ = ±1 depends only on the quadrants in which z1, z2, and z1z2 are located. This follows from some manipulations, which again make use of all of the basic properties of the Jacobi symbol, and in particular the law of quadratic reciprocity.

In dealing with the sum SII for small moduli we shall, as in the case of SI, be able to treat individual d. Thus, referring to the definition of Eq. 16 and incorporating the factor (r/d) into the β, we are left to estimate sums Inline graphicβz[z]. Because of its resemblance to the Möbius function we are able to decompose β in much the same fashion as we did with Λ in the treatment of the sieve. This reduces the problem to bounding linear forms Inline graphic[wz] and bilinear forms Inline graphicαwγz[wz], where in both cases the variables are constrained by their product being required to belong to a given box. The multiplicativity property of Eq. 17 is basic for the study of such sums. There are a number of alternative treatments. In the case of the linear forms we are able to neglect the summation in all but one variable and reduce the problem to the estimation of the sum of the Jacobi symbol over an interval to which we apply the Polya–Vinogradov inequality. The estimate of Burgess could be used here to improve the result, but this is not necessary for the final goal. For the bilinear forms it is also possible to reduce the problem to a similar one variable sum and here it would suffice to apply the Burgess bound. There is however another method that is more elementary yet, because it takes advantage of a second variable, also gives stronger results.

Combining these bounds we complete the proof of the final step to the main theorem. Because the function Λ can also be decomposed in essentially the same fashion as β (in fact, more simply), we get the following by-product of the work in this section. Define λ(n) = Inline graphic(s/r), where r is odd and positive and s is even. Then for some absolute constant η > 0, for example η = 1/77, we have

graphic file with name M48.gif 18

Discussion

As indicated earlier the methods yield incidentally the uniform distribution of the Gaussian primes a + ib2 in sectors and for that matter within any region having a reasonable boundary. One also obtains the expected distribution on restricting the variables a and b to residue classes having moduli either fixed or growing less quickly than some power of log x.

It seems certain that the methods extend to cover the case of the prime values of ϕ (a, b2) for any binary quadratic form ϕ. We have not checked this in detail. One should certainly expect some complications regarding composition of forms in case of nontrivial class group.

A more interesting extension would be the proof of the asymptotic formula, or even a lower bound of the expected order for primes of the form a2 + b6. The arguments herein carry over to a considerable extent, provided that quadratic reciprocity is replaced by cubic reciprocity. One has, thanks again to the work of Fouvry and Iwaniec (3), the level D(x) = x2/3−ɛ in [R]. Although also best-possible, this is smaller than before. Thus, one faces more effort in the combinatorics to avoid the restriction D > x2/3 in the sieve part and, of greater significance, one has to settle for lesser results in bounding the sums occurring in the bilinear form [B].

In the event that one can produce primes a2 + b6 then this would have an interesting application to elliptic curves, since it is morally certain that trivial modifications would allow treatment of the polynomial 27a2 + b6, which is the negative of the discriminant of the elliptic curve y2 = 4x3 + b2x + a. One would thus deduce the existence of infinitely many elliptic curves over ℚ having prime discriminant. Here we would not get the expected magnitude for the number of such curves since we are not able to directly touch the polynomial x2 + y3. After all, three is not an even number!

Although the proofs of our results are rather lengthy and complicated we are able to avoid much of the high-powered technology frequently used in modern analytic number theory such as the bounds of Weil and Deligne. We also do not appeal to the theory of automorphic functions although experts will, in several places, detect it bubbling just beneath the surface.

Acknowledgments

J.F. is supported in part by Natural Sciences and Engineering Research Council (Canada) Grant A5123. H.I. is supported in part by National Science Foundation Grant DMS-9500797.

References

  • 1.Iwaniec H. Acta Arith. 1973;24:435–459. [Google Scholar]
  • 2.Bombieri E. Mem Acad Naz Dei XL. 1976;1/2:243–269. [Google Scholar]
  • 3.Fouvry, E. & Iwaniec, H. (1997) Acta Arith., in press.
  • 4.Hecke E. Math Z. 1920;6:11–51. [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES