Significance
Prime numbers play a central role in analytic number theory, and are well known to be very well distributed among the reduced residue classes . Surprisingly, the same does not appear to be true for sequences of consecutive primes, with different patterns occurring with wildly different frequencies. We formulate a precise conjecture, based on the Hardy−Littlewood conjectures, which explains this phenomenon. In particular, we predict that all patterns do occur their fair share of the time in the limit, but that there are secondary terms only very slowly tending to zero that create the observed biases.
Keywords: prime numbers, consecutive primes, Hardy–Littlewood conjectures, singular series
Abstract
Although the sequence of primes is very well distributed in the reduced residue classes , the distribution of pairs of consecutive primes among the permissible ϕ(q)2 pairs of reduced residue classes is surprisingly erratic. This paper proposes a conjectural explanation for this phenomenon, based on the Hardy−Littlewood conjectures. The conjectures are then compared with numerical data, and the observed fit is very good.
1. Introduction
The prime number theorem in arithmetic progressions shows that the sequence of primes is equidistributed among the reduced residue classes . If the Generalized Riemann Hypothesis is true, then this holds in the more precise form
and denotes the number of primes up to x lying in the reduced residue class . Nevertheless, it was noticed by Chebyshev that certain residue classes seem to be slightly preferred; for example, among the first million primes, we find that
Chebyshev’s bias is beautifully explained by the work of Rubinstein and Sarnak (1) (see ref. 2 for a survey of related work), who showed (in a certain sense and under some natural conjectures) that for of all positive x.
What happens if we consider the patterns of residues among strings of consecutive primes? Let denote the sequence of primes in ascending order. Let be an integer, and let denote an r-tuple of reduced residue classes . Define
which counts the number of occurrences of the pattern among r consecutive primes the least of which is below x. When , little is known about the distribution of such patterns among the primes. When and (thus , 4, or 6), Knapowski and Turán (3) observed that all of the four possible patterns of length 2 appear infinitely many times. The main significant result in this direction is due to Shiu (4), who established that, for any , a reduced residue class , and any , the pattern occurs infinitely often. Recent progress in sieve theory has led to a new proof of Shiu’s result (see ref. 5), and, moreover, Maynard (6) has shown that .
Despite the lack of understanding of , any model based on the randomness of the primes would suggest strongly that every permissible pattern of r consecutive primes appears roughly equally often; that is, if is an r-tuple of reduced residue classes , then . However, a look at the data might shake that belief! For example, among the first million primes (for convenience, restricting to those greater than 3), we find
These numbers show substantial deviations from the expectation that all four quantities should be roughly . Further, Chebyshev’s bias might have suggested a slight preference for the pattern over the other possibilities, and this is clearly not the case.
The discrepancy observed above persists for larger x, and also exists for other moduli q. For example, among the first hundred million primes modulo 10, there is substantial deviation from the prediction that each of the 16 pairs should have about 6.25 million occurrences. Specifically, with , we find the following.
![]() |
Apart from the fact that the entries vary dramatically (much more than in Chebyshev’s bias), the key feature to be observed in these data is that the diagonal classes occur significantly less often than the nondiagonal classes. Chebyshev’s bias states that the residue classes 3 and very often contain slightly more primes than the residue classes 1 and , but curiously in our data the patterns and appear less frequently than and ; this suggests again that a different phenomenon is at play here.
The purpose of this paper is to develop a heuristic, based on the Hardy−Littlewood prime k-tuples conjecture, which explains the biases seen above. We are led to conjecture that although the primes counted by do have density in the limit, there are large secondary terms in the asymptotic formula which create biases toward and against certain patterns. The dominant factor in this bias is determined by the number of i for which , but there are also lower-order terms that do not have an easy description.
Main Conjecture.
With notation as above, we have
where
When , the constant is given in [2.23]. If , it is given by
In general, the quantity seems complicated, but there are some situations where it simplifies. For example, if for a reduced residue class , then, regardless of the choice of a, we have
| [1.1] |
We can also show that for any two reduced residue classes a and . Moreover, although seems involved, the symmetric quantity simplifies nicely: For distinct reduced residue classes a, , we have
| [1.2] |
where denotes the von Mangoldt function. In particular, this expression depends only on the difference .
Conjecture 1.1. If a and b are distinct reduced residue classes , then equals
whereas equals
We give a few amusing consequences of the Main Conjecture. The famous biases , or , or are known to be false infinitely often. However, we conjecture that the robust biases in pairs of consecutive primes or may hold always and from the very start!
Conjecture 1.2. Let or 4, and let a be either or . Then, for all , we have
Indeed, for large x, we have
Given a prime q, the product of two consecutive primes prefers to be a quadratic nonresidue rather than a quadratic residue.
Conjecture 1.3. Let q be a fixed odd prime. For large x, we have
The constants in the Main Conjecture also simplify dramatically if one only cares about patterns exhibited by and for .
Conjecture 1.4. If and a and b are distinct reduced residues , then
while
Form a transition matrix (with rows and columns indexed by reduced residue classes) and the th entry being the probability that a prime is followed by . Then Conjecture 1.4 shows that the corresponding transition matrix going from to is not the square of the transition matrix going from to . Thus, the primes are not Markovian, and this may also be seen directly from the Main Conjecture by the formula given for when (which is used to derive Conjecture 1.4).
The ideas that lead to the Main Conjecture imply that there will be symmetries between the number of occurrences of different patterns.
Conjecture 1.5. Given and q as above, define . For large x, we have
Example. We find
and
while the nearest number of occurrences of another pattern is
If the modulus is a prime power, there are additional symmetries.
Conjecture 1.6. Let q be a prime and let . If and are such that and for each , then
In particular, if a is odd, then, up to an error , depends only on .
Example. We find
In the direction of these conjectures, the earliest work we found is the paper of Knapowski and Turán (3), who “guess” that the events and for the four possibilities of a and b are “not equally probable.” However, Knapowski and Turán go on to suggest that , which is now definitively false by Maynard’s work (6). The paper (3) was published after the death of both authors, and perhaps they had something else in mind, maybe along the lines of our Conjecture 1.2 above? More recently, in Ko (7), numerical results observing the biases in the distribution of consecutive primes for small moduli are given. The paper by Ash, Beltis, Gross, and Sinnott (8) again observes these biases in pairs of consecutive primes and initiates an attempt toward understanding them based on the Hardy−Littlewood conjectures. The heuristic expression in ref. 8 is a large sum of singular series, and, as the authors note, it is unclear from that expression whether tends to for large x. They also note symmetries akin to Conjectures 1.5 and 1.6 for pairs of consecutive primes.
In the Main Conjecture, we expect that the remainder term is given by a sum involving the zeros of Dirichlet L-functions . The main terms given in the Main Conjecture are the same for all repeating patterns ; nevertheless, numerically, one observes some deviations in the counts of such patterns, and we expect the lower-order fluctuations to account for these deviations. In addition to the contributions from zeros, which we expect to be oscillating, there also appear to be nonoscillating lower-order terms of size , which may play a bigger role for the computable ranges of x. We hope to understand these lower-order terms in future work.
An initial guess for why there is a bias against the repeating patterns might be that, after a prime occurs that is , all other classes have a chance to represent a prime before a occurs again. However, a straightforward application of the Selberg sieve shows that the number of primes for which is , which is of a smaller order of magnitude than the bias predicted by the Main Conjecture.
Although we do not pursue this here, it should be possible to prove unconditional analogs of the Main Conjecture in other settings, for example, to numbers free of small prime factors or for squarefree integers (in the latter case, the biases will be manifested already at the level of the constant in the main term). More generally, analogous biases seem to arise for many other sifted sets, for example, in the sums of two squares. We also mention two other settings in which large biases are seen: the distribution of prime geodesics for compact hyperbolic surfaces into various homology classes (see the discussion at the end of ref. 1) and the recent work of Dummit, Granville, and Kisilevsky (9) concerning the distribution of numbers that are products of two primes.
2. The Heuristic for
In this section, we develop a heuristic explanation of the Main Conjecture in the case . The heuristic (like several other conjectures about the primes; see, for example, refs. 10–14) is based upon the Hardy−Littlewood prime k-tuples conjecture. We begin by reviewing quickly the Hardy−Littlewood conjectures and some related results, before proceeding to develop an analog suitable for understanding .
2.1. The Hardy−Littlewood Conjectures.
Let be a finite subset of , and let denote the characteristic function of the primes. In a strong form, the Hardy−Littlewood conjecture asserts that
where the singular series is given by
In our calculations, it will be important to understand the behavior of the singular series “on average.” Here, Gallagher (10) established that, for any and as ,
| [2.1] |
so that the singular series is 1 on average. A refined version of this asymptotic was established by Montgomery and Soundararajan (13), who introduced the modified singular series
with . The modified singular series arises naturally in the following version of the Hardy−Littlewood conjecture (thinking of the elements of as being small in comparison with x):
and the term that is subtracted above arises naturally as the probability that the “random number” is prime. Montgomery and Soundararajan showed that
| [2.2] |
where is the kth moment of the standard Gaussian (in particular, if k is odd) and A is a constant independent of k. This refines Gallagher’s asymptotic [2.1], and shows that exhibits roughly square-root cancellation in each variable.
2.2. Modified Hardy−Littlewood Conjectures.
We need a slight modification of the Hardy−Littlewood conjecture, taking into account congruence conditions . For any integer and a finite subset of the integers, we define the singular series at the primes away from q by
If is such that for all , then we expect that
| [2.3] |
where the factor arises because is conditioned to be coprime to q for all , and the factor arises because we are restricting n to one residue class . In analogy with , it is also useful to define , so that . Once again, the quantity arises naturally in the asymptotic [conditioning for all ]
| [2.4] |
where the term being subtracted arises naturally as the probability that is prime, conditioned on the fact that is coprime to q.
2.3. First Steps Toward the Conjecture.
Let a and b be two reduced residue classes , and let h be a positive integer with . We now formulate a conjecture for the number of primes with and such that the next prime after n is . The gaps between consecutive primes are conjectured to be distributed like a Poisson process with mean (and Gallagher showed that this follows from the Hardy−Littlewood conjectures), and so h should be thought of as a parameter on the scale of . With this in mind, we are interested in
| [2.5] |
where, for a variable n conditioned to be coprime to q, we set . Write also and similarly for , and then expand out the product in [2.5]; thus we arrive at [ignoring the small differences between , or ]
| [2.6] |
Given reduced residue classes a and b, and a positive , we may write
| [2.7] |
where is independent of h. We also write, for convenience,
| [2.8] |
Appealing now to the conjectured relation [2.4], we are led to hypothesize that the quantity in [2.5] (and [2.6]) is
| [2.9] |
Before proceeding further, a few points are in order. Note that is about , and this exponential decay in h is in keeping with the conjecture that gaps between consecutive primes are distributed like a Poisson process. Secondly, by replacing and above with and , and noting also that , we may see that the quantity [2.9] above does not change if we replace by ; this is an example of the symmetry between and noted in Conjecture 1.5. Similarly, under the hypotheses of Conjecture 1.6, the conditions satisfied by h and are exactly the same for and . Lastly, in arriving at [2.9], we have paid no attention to error terms, and, moreover, have used a uniform version of the Hardy−Littlewood conjecture both in terms of the size of the parameters in the set (this is relatively minor) and in terms of the size of the set . To mitigate the last point, we note that, in expanding out the inclusion−exclusion product in [2.5], we may obtain upper and lower bounds by stopping after an odd or an even number of steps (as in Brun’s sieve, for example); in this manner, only a mildly uniform version of the Hardy−Littlewood conjectures seems needed. For the present, we ignore these details, but it would be desirable to place the conjecture [2.9] on a firmer footing.
With conjecture [2.9] in hand, we have a conjecture for : Namely, we sum the quantity in [2.9] over all positive integers . Thus, we expect that
| [2.10] |
say, where
| [2.11] |
2.4. Discarding Singular Series Involving Sets with Three or More Elements.
We now conjecture that only terms with [which gives rise to the main term of for ] and give significant contributions leading to the Main Conjecture, and that all other terms contribute to an amount . To argue this, we will use as a guide the work of Montgomery and Soundararajan (13), in particular [2.2] above, which shows that sums over singular series exhibit square-root cancellation in each variable.
Suppose, for example, that and in [2.11]. After summing over the variable h, these terms may be thought of as times an average of over element sets whose elements are all of size about . The estimate [2.2] now suggests that this contribution is , and, because , the final contribution to is . If , then the same argument—drawing on [2.2] with there, so that the main term there vanishes and the bound is —indicates that such terms contribute to an amount that is already smaller than the secondary main terms claimed in the Main Conjecture. We believe that, when k is odd, the work of Montgomery and Soundararajan (13) can be refined, and the actual size of the sum in [2.2] is . This expectation suggests that the terms with and also make a contribution of .
When or , then a similar heuristic to the above shows that terms with make a contribution to of . Finally, if and , then the contribution to [2.11] may be roughly thought of as times an average of singular series where (standing for ) runs over element sets with elements of size . Because the singular series is translation-invariant, one can think of this last sum as being times the average over element sets with all elements of size . After making this observation, we can draw on [2.2] (with its proposed refinement for odd k) as earlier, and this leads to the prediction that the contribution to of terms with and any nonempty is .
Thus, discarding all terms with , we now replace the density in [2.11] with
| [2.12] |
where (keeping in mind that is 1 for the empty set and 0 for a singleton)
| [2.13] |
| [2.14] |
and
| [2.15] |
Inserting this in [2.10], we thus conjecture that, up to , there holds
| [2.16] |
2.5. The Main Proposition.
To evaluate the sums over two-term singular series above, we invoke the following proposition whose proof we defer to Section 3, Proof of the Proposition.
Proposition 2.1. Let , and let be any residue class. For any positive real number H, define
Then we may write
where
and, for any , the quantity is described in [3.2] below, satisfies the bound , and which we conjecture to be . Further, if with , then
where
with for and extended periodically for all v, and
2.6. Completing the Heuristic.
Returning to our heuristic calculation, we will apply Proposition 2.1 with
| [2.17] |
We begin by simplifying a bit the expressions for , , and , discarding terms of size , which are negligible for the Main Conjecture. Thus, after summing the geometric series and using [2.17],
| [2.18] |
The definition of involves two singular series, and . Consider the terms arising from the second case. Replace by where also lies in and note that the condition becomes . Thus, ignoring terms of size , the second case in contributes
Arguing similarly with the first case, we conclude that
| [2.19] |
Finally, note that
so that
| [2.20] |
Using Proposition 2.1 to evaluate [2.18], [2.19], and [2.20] and then inserting that in [2.10] leads to the Main Conjecture. The term involving arises from terms involving , which has a leading term of size whereas all other are only of constant size. Thus, isolating the leading contribution to and tracking its appearance in our expressions for , and gives
The term involving is complicated, but follows straightforwardly from our work above. Having already treated the term arising in , the contributions leading to come from the terms in Proposition 2.1. We thus have
| [2.21] |
With (which is zero unless χ is an odd character), we may also derive the following alternative expression:
| [2.22] |
If χ is induced by the primitive character , then, writing for some m coprime to the conductor of , we have
Further, it is helpful to write with odd. If now χ is a character to an odd modulus and q is even, then
Using these facts, it is possible to simplify the formula in [2.22] further, and obtain
| [2.23] |
For example, if q is prime and , then
This completes our discussion of the Main Conjecture in the case , and the other conjectures follow as simple consequences.
3. Proof of the Proposition
The proof follows along standard lines, and the closely related case of evaluating asymptotically is mentioned in ref. 15 and treated in detail in ref. 16. We will therefore be brief. Let χ be a Dirichlet character modulo ; possibly, χ could be imprimitive, or the principal character. Define, for Re,
so that
| [3.1] |
We now note that
which furnishes a meromorphic continuation of to Re with possible poles at or in case χ is principal. We may also express the above as
and now the final product above is analytic in Re, but for which the line Re forms a natural boundary.
If χ is nonprincipal, then, by shifting the line of integration to Re, we find that the quantity in [3.1] is , with the main term coming from the pole of at . Moreover, we may even shift the line of integration to Re at the cost of picking up residues from the zeros of . The contribution from these zeros is
If we suppose that GRH holds for , that its zeros are simple, and that is not too small so that [in view of the exponential decay of ] the sum over residues is absolutely convergent, then we would expect that is an oscillating term of size .
If χ is principal, but , then has a pole at with residue , but there is no pole of at because for s near 0. Therefore, in this situation, we find
Finally, if (and χ is naturally principal), the corresponding has a simple pole at in addition to the pole at . Thus, there is a double pole of the integrand in [3.1], and, computing residues, we obtain that
Because
our proposition follows, with
| [3.2] |
4. Modifications to the Heuristic When
The ideas leading to the general case of the Main Conjecture are similar to those for , and so we just give a brief sketch. For and , we start by writing as
As before, we expand this out, invoke the Hardy−Littlewood conjectures, and then discard all singular series terms except for the empty set and sets with two elements. This leads to
where and , , and are certain smooth sums of singular series. For , we have [with as before]
Notice that, if in the inner summation, the resulting expression is times the analogous term in our calculation for . If , we will need to consider sums of the form
where . This can be understood via contour integration as in Proposition 2.1; a key difference is that, for , we have unless , in which case . Using this to evaluate , we find that it is [up to ]
and it is this last term that creates the additional bias [in ] against patterns with a nonimmediate repetition.
For , up to , we obtain a contribution of times
Finally, from , we obtain times
Assembling these contributions yields the Main Conjecture.
5. Comparison of the Conjecture with Numerical Data
We begin by comparing the Main Conjecture with the data for and or 4. In each of these cases, our conjecture is that
| [5.1] |
with the sign being negative if and positive if not. However, to obtain [5.1] in such a clean form, a number of asymptotic approximations were used throughout Section 2, The Heuristic for r = 2, and it is reasonable to expect that the unsimplified integral expression [2.16] for would provide a better fit to the data. Indeed, we find the following.
![]() |
Going forward, we will present only the comparison of against [2.16], so we explain briefly how we compute this approximation. In [2.18], [2.19], and [2.20], we determined , , and in terms of and, in the process, replaced geometric progressions in h with suitable approximations. Of course, the geometric progressions could just be computed exactly. We keep the exact but messy expressions so obtained and, for , use the main terms described in Proposition 2.1. This yields an expression for as an explicit integral, which we computed numerically in Sage. The actual values of were computed in C++ using the primesieve library. Code for both computations can be found on the first author’s website.
Next we consider . Here too the constants simplify, with depending only on the difference (a fact reflected in the data, as predicted by Conjecture 1.6). Explicitly, we have , and. Thus, we should expect that, among the nondiagonal patterns, those with should be the least frequent, and those with and 6 should be rather close. Indeed, we find the following.
![]() |
We now turn to the patterns (). Here, the quadratic character plays a role for those patterns with . In particular, it does not play a role in the diagonal patterns, for which is given by [1.1]. For nondiagonal patterns, we have the following.
![]() |
[The other values of are determined by .]
Here, , so that and are the largest of these. Moreover, as in the case, there are symmetries between patterns with the same difference . We find the following.
![]() |
We close by considering (which amounts to considering the last decimal digit of primes). Essentially, no simplifications can be made for the constants . For any nondiagonal pattern , we find
where χ is either of the complex characters . Apart from the understood symmetry , the value of determines the pattern. Thus, we might expect significant variation between the various patterns and, in particular, no additional symmetries like we saw and . We find the following, presenting only the first of and ,
![]() |
An interesting feature to be observed here is that, initially, is larger than , despite our conjecture predicting the opposite ordering. In fact, this is true for all x between and . However, at about , becomes consistently larger, seemingly forever, exactly as our conjecture would predict. We take this as reasonable evidence for our speculation that there are even more lower-order terms [e.g., on the order of ], which, in this case, apparently conspire to point in the opposite direction than the bias in the Main Conjecture.
Acknowledgments
We thank Tadashi Tokieda, whose lecture on “Rock, paper, scissors in probability” inspired the present work; James Maynard for drawing our attention to ref. 3; Paul Abbott for pointing us to ref. 7; and Alexandra Florea, Andrew Granville, and Peter Sarnak for helpful comments. The first author is partially supported by National Science Foundation (NSF) postdoctoral fellowship Division of Mathematical Sciences 1303913. The second author is partially supported by the NSF, and by a Simons Investigator Award from the Simons Foundation.
Footnotes
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
References
- 1.Rubinstein M, Sarnak P. Chebyshev’s bias. Exp Math. 1994;3(3):173–197. [Google Scholar]
- 2.Granville A, Martin G. Prime number races. Am Math Mon. 2006;113(1):1–33. [Google Scholar]
- 3.Knapowski S, Turán P. Number Theory and Algebra. Academic; New York: 1977. On prime numbers resp. 3mod 4; pp. 157–165. [Google Scholar]
- 4.Shiu DKL. Strings of congruent primes. J Lond Math Soc. 2000;61(2):359–373. [Google Scholar]
- 5.Banks WD, Freiberg T, Turnage-Butterbaugh CL. Consecutive primes in tuples. Acta Arith. 2015;167(3):261–266. [Google Scholar]
- 6.Maynard J. 2014. Dense clusters of primes in subsets. arXiv:14052953.
- 7.Ko C-M. Distribution of the units digit of primes. Chaos Solitons Fractals. 2002;13(6):1295–1302. [Google Scholar]
- 8.Ash A, Beltis L, Gross R, Sinnott W. Frequencies of successive pairs of prime residues. Exp Math. 2011;20(4):400–411. [Google Scholar]
- 9.Dummit D, Granville A, Kisilevsky H. Big biases amongst products of two primes. Mathematika. 2016;62(2):502–507. [Google Scholar]
- 10.Gallagher PX. On the distribution of primes in short intervals. Mathematika. 1976;23(1):4–9. [Google Scholar]
- 11.Goldston DA, Ledoan AH. The jumping champion conjecture. Mathematika. 2015;61(3):719–740. [Google Scholar]
- 12.Granville A, van de Lune J, te Riele HJJ. Checking the Goldbach conjecture on a vector computer. In: Mollin RA, editor. Number Theory and Applications. Kluwer; Dordrecht, The Netherlands: 1989. pp. 423–433. [Google Scholar]
- 13.Montgomery HL, Soundararajan K. Primes in short intervals. Commun Math Phys. 2004;252(1-3):589–617. [Google Scholar]
- 14.Odlyzko A, Rubinstein M, Wolf M. Jumping champions. Exp Math. 1999;8(2):107–118. [Google Scholar]
- 15.Goldston DA. Linnik’s theorem on Goldbach numbers in short intervals. Glasg Math J. 1990;32(3):285–297. [Google Scholar]
- 16.Montgomery HL, Soundararajan K. 2002. Beyond pair correlation. Paul Erdős and His Mathematics, I (Budapest, 1999), Bolyai Society Mathematical Studies (János Bolyai Math Soc, Budapest), Vol 11, pp 507–514.






