2021.04.07 Math 9520 Goldbach approximations

222 days ago by calkin

In a 1742 letter to Euler, Goldbach conjectured that every even integer greater than or equal to 4 was the sum of two primes.

This has become known as the Goldbach Conjecture.

In this worksheet we'll explore this conjecture.


Task 1: Fix N=1000 and create a list of all the primes less than N.

What is the largest prime on the list?  How many primes are there in the list.

N= 10000 prime_list = [] for n in srange(N): if is_prime(n): prime_list.append(n) 
       
len(prime_list) 
       
1229
1229
prime_list[-1] 
       
9973
9973

Suppose we want to check Goldbach's conjecture: how might we proceed?

Set up a list filled with 0's.  Step through the prime_list and for each prime, 

step through the prime_list again, and add the result to get m.  Increment the value of 

Goldbach_reps[m]

Goldbach_reps=[0 for n in srange(N)] for p1 in prime_list: for p2 in prime_list: if p1+p2<N: Goldbach_reps[p1+p2]+=1 
       
list_plot(Goldbach_reps) 
       
gr_evens=[] for i in srange(1,N/2): gr_evens.append([2*i,Goldbach_reps[2*i]]) 
       
list_plot(gr_evens[:500]) 
       

Observation: it appears that the number of representations of even n as a sum of two primes is growing.

It appears that there are several (at least two) striations.  Can we identify either of them?

gr_evens[:5] 
       
[[2, 0], [4, 1], [6, 1], [8, 2], [10, 3]]
[[2, 0], [4, 1], [6, 1], [8, 2], [10, 3]]
gr_evens[100:110] 
       
[[202, 17],
 [204, 28],
 [206, 13],
 [208, 14],
 [210, 38],
 [212, 12],
 [214, 15],
 [216, 26],
 [218, 13],
 [220, 18]]
[[202, 17],
 [204, 28],
 [206, 13],
 [208, 14],
 [210, 38],
 [212, 12],
 [214, 15],
 [216, 26],
 [218, 13],
 [220, 18]]

Call a value of n a "champion" if the number of representations is bigger than any value we've seen so far.

Compute a list of champions.

What are the factorizations of champions?


current_max=0 champions=[] for i in gr_evens: if i[1]>current_max: champions.append(i) current_max=i[1] 
       
list_plot(champions) 
       
print champions[:100] 
       
[[4, 1], [8, 2], [10, 3], [16, 4], [22, 5], [24, 6], [34, 7], [36, 8],
[48, 10], [60, 12], [78, 14], [84, 16], [90, 18], [114, 20], [120, 24],
[168, 26], [180, 28], [210, 38], [300, 42], [330, 48], [390, 54], [420,
60], [510, 64], [630, 82], [780, 88], [840, 102], [990, 104], [1050,
114], [1140, 116], [1260, 136], [1470, 146], [1650, 152], [1680, 166],
[1890, 182], [2100, 194], [2310, 228], [2730, 256], [3150, 276], [3570,
308], [3990, 326], [4200, 330], [4410, 342], [4620, 380], [5250, 396],
[5460, 436], [6090, 444], [6510, 482], [6930, 536], [7980, 548], [8190,
584], [9030, 606], [9240, 658]]
[[4, 1], [8, 2], [10, 3], [16, 4], [22, 5], [24, 6], [34, 7], [36, 8], [48, 10], [60, 12], [78, 14], [84, 16], [90, 18], [114, 20], [120, 24], [168, 26], [180, 28], [210, 38], [300, 42], [330, 48], [390, 54], [420, 60], [510, 64], [630, 82], [780, 88], [840, 102], [990, 104], [1050, 114], [1140, 116], [1260, 136], [1470, 146], [1650, 152], [1680, 166], [1890, 182], [2100, 194], [2310, 228], [2730, 256], [3150, 276], [3570, 308], [3990, 326], [4200, 330], [4410, 342], [4620, 380], [5250, 396], [5460, 436], [6090, 444], [6510, 482], [6930, 536], [7980, 548], [8190, 584], [9030, 606], [9240, 658]]

It looks like it is the case that for large enough champions they are all divisible by 2.3.5.7  Is this the case for larger p than 7 too?

Today: we'll explore the importance of divisibility by 3 in Goldbach's conjecture.

First, let's take the data we've gathered, and create three plots:

Plot A: when n is divisible by 3, i.e. 0 (mod 3)

Plot B: when n is 1 (mod 3)

Plot C: when n is 2 (mod 3)

Suggested method: step through gr_evens and create three new lists: 

gr_3_0=[] gr_3_1=[] gr_3_2=[] for i in gr_evens: if i[0]%3 ==0: gr_3_0.append(i) elif i[0]%3 ==1: gr_3_1.append(i) elif i[0]%3 ==2: gr_3_2.append(i) 
       
A=list_plot(gr_3_0,color="red") B=list_plot(gr_3_1,color="blue") C=list_plot(gr_3_2,color="green") show(B+C+A) 
       

Lucy's conjecture: factors matter.  Let's test this: create lists according to whether n is divisible by exactly 3, 5, both or neither.

Call them gr_00, gr_10, gr_01, gr_11 where 0 means divisible, 1 means not, and the first position means 3, the second means 5

gr_0_0=[] gr_0_1=[] gr_1_0=[] gr_1_1=[] for i in gr_evens: if i[0]%3==0: if i[0]%5==0: gr_0_0.append(i) else: gr_0_1.append(i) else: if i[0]%5==0: gr_1_0.append(i) else: gr_1_1.append(i) 
       
A00=list_plot( gr_0_0,color='red') A01=list_plot(gr_0_1,color='blue') A10=list_plot(gr_1_0,color='green') A11=list_plot(gr_1_1,color='black') 
       
show(A00+A01+A10+A11) 
       

It appears that Lucy's conjecture that divisibility by 3 and 5 matters is confirmed. 

Let's a) see if we can quantify the difference that it makes and

b) come up with (heuristic) reasons for the behavior.

Exercise: plot the number of representations of n, dividing by 2 when n is divisible by 3, and unchanged when n is not divisible by 3, in different colors, twice, once with each color on top.


Let's consider the divisibility by 3.

How many primes are there less than N?  About $N/\log N$.

Now consider the primes (mod 3).

There's exactly one prime congruent to 0 (mod 3), namely 3.

Dirichlet proved that the remaining primes are distributed about equally into the other two residue classes..

Let's call the set of primes congruent to 1 (mod 6) $S_1$, and the set congruent to 5 (mod 6) $S_5$.

Notice that (with the exception of 2 and 3), all primes are either 1 or 5 (mod 6).

(Note that if we were considering divisibility by 5 instead we might have sets $S_1, S_3, S_7$ and $S_9$).

What can we say about even numbers that are sums of two primes?

If $n\equiv 0 (mod 3)$ then since $n$ is even, it is actually $0 (mod 6)$.

If $n\equiv 1 (mod 3)$ then $n\equiv 4 (mod 6)$.

If $n\equiv 2 (mod 3)$ then $n\equiv 2 (mod 6)$.

Now, considere how we can obtain $n$ as a sum of primes.

If $n \equiv 0 (mod \ 6)$, then if $n=p+q$, then one of $p,q$ is in $S_1$, and the other is in $S_5$.

If $n\equiv 2 (mod \ 6)$, then if $n=p+q$, both of $p,q$ are in $S_1$.

If $n \equiv 4 (mod \ 6)$ then if $n=p+p$, both of $p, q$ are in $S_5$.

What do we mean by a heuristic?  It's a reason to believe something is true, a plausible explanation for a mechanism for why something happens, as opposed to a formal proof that something is true.

Suppose that $|S_1|\simeq |S_5|=s$.

How many ordered pairs $(p,q)$ are there where $p\in S_1$ and $q\in S_5$?  About $s^2$.

How many ordered pairs $(p,q)$ are there where $p, q \in S_1$?  About $s^2$.

How many ordered pairs $(p,q)$ are there where $p, q \in S_5$?  About $s^2$.

Once we consider the ordering of the primes, we see that we get about $2s^2$ sums from $S_1 + S_5$,

but only $s^2$ from $S_1 +S_1$ and $s^2$ from $S_5 + S_5$.

Let's do an example: $S_1 = \{7, 13,19 \}$ and $S_5 = \{5, 11,17  \}$

$S_1+S_5 = \{7+5, 7+11, 7+17,  13 + 5, 13+11, 13+17, 19+5, 19+11, 19+17 \}$

and in our algorithm, each of these would get counted twice, once as 13+17, for example, and once as 17+13.

$S_1+S_1 = \{ 7+7, 7+13, 7+19, 13+7,13+13, 13+19, 19+7, 19+13, 19+19 \}$ but in this case, all of the sums have appeared in both orders!  

So all the representations of even numbers congruent to $0  (mod\ 6)$ add up to $2s^2$, whereas all the representations of even numbers congruent $2 (mod \ 6)$ add up to $s^2$, and the same is true for $4 (mod \ 6)$.

Can we extend this idea to divisibility by 5?

Modulo 10, we have even residue classes 0, 2, 4, 6, 8.

Our primes (except for 2 and 5) all fall in residue classes 1, 3, 7, 9 (mod 10).  Call these sets of primes

$S_1, S_3, S_7, S_9$.

How can we get 0 (mod 10) as a sum of two primes?  $S_1+S_9, S_3+S_7$.

How can we get 2 (mod 10) as a sum of two primes? $S_1+S_1, S_2+S_9$.

How can we get 4 (mod 10) as a sum of two primes? $S_1+S_3, S_7+S_7$.

How can we get 6 (mod 10) as a sum of two primes? $S_3+S_3, S_7+S_9$.

How can we get 8 (mod 10) as a sum of two primes? $S_1+S_7, S_9+S_9$.

Just as with mod 3, when we add two distinct sets of size $s$, we get $2s^2$ representations, but when we add  a set of size $s$ to itself, we get $s^2$ representations.  

So, for 0 (mod 10), we get $4s^2$ representations, and for all the others, we get $3s^2$ representations: in other words, we get 4/3 times as many representations for even numbers divisible by 5, on average, than we do for numbers not divisible by 5.

Can we unify things, and calculate a correction factor for all odd primes?   Note that $2=(3-1)/(3-2)$ and $4/3=(5-1)/(5-2)$.  Can we show that the correction factor for a prime $p$ is $(p-1)/(p-2)$.

If we work modulo $p$, we have $p-1$ residue classes (mod $2p$) which contain infinitely many primes.  By Dirichlet's theorem they will all be about the same size.  Note that there is no class $S_p$.

We get 0 (mod $2p$) by adding $S_i +S_{2p-i}$: these are distinct classes.  There are $(p-1)/2$ choices for $i$.

We get every other residue class in two different ways: one as the sum of a class and itself, the other ways as sums of two distinct classes.  There are $(p-1)/2-1=(p-3)/2$ ways to do this as a sum of two distinct classes.

So, when divisible by $p$, we get $(p-1)2s^2/2=(p-1)s^2$ representations.

When not divisible by $p$, we get $(p-3)2s^2/2+s^2=(p-2)s^2$.

Thus, on average, we get $(p-1)/(p-2)$ times as many representations for numbers divisible by $p$ as we do for numbers not divisible by $p$.

We will build a function $HL(n)$ which contains the correction factor for each prime dividing $n$.  We'll call this the Hardy-Littlewood correction factor.  It will be defined by 

\[  HL(n) = \prod_{p|n} \frac{p-1}{p-2}  \]

where the product is over all odd primes $p$ dividing $n$.

Now, if we take the number of representations of $n$ and divide by this correction factor, we should hope to counter the effects of divisibility by odd primes on the number of representations.

Now the question is: given an integer $n$ how do we get the odd primes dividing it?

def HL(n): # want this to work for odd numbers too, but how to know whether to skip the first factor if it is 2? g=list(factor(2*n)) # this forces 2 to be a prime factor, which makes it easy to skip cf=1 #correction factor for i in g[1:]: # this skips the prime 2, and goes through all the other prime,power pairs p=i[0] # here p is the current odd prime cf*=(p-1)/(p-2) return(cf) 
       
HL_values=[] for i in srange(1,20000): HL_values.append([2*i,HL(2*i)]) 
       
list_plot(HL_values) 
       

What to do now?  

corrected_gr=[] for i in gr_evens: corrected_gr.append([i[0],i[1]/HL(i[0])]) 
       
list_plot(corrected_gr) 
       

It appears from the plot that the correction factor brings all the striations together, explaining most of the variation in gr_evens.

Now let's ask the question "how fast should we expect the number of representations to grow?"

We'll need to think about how many primes there are up to $N$.  The prime number theorem says

\[ \pi(x) \sim Li(x) \]

In this formula, $\pi(x)$ is the number of primes less than or equal to $x$, 

\[ Li(x)=\int_2^x \frac{1}{\log(t)} dt  \]

and $f(x)\sim g(x)$ means that 

\[ \lim_{x \rightarrow \infty} \frac{f(x)}{g(x)} = 1 \]

Since $Li(x) \sim \frac{x}{\log(x)}$, we can use $x/\log(x)$ as an approximate growth rate for $\pi(x)$.

Primes are most definitely *not* random.  But, in many ways they behave a little bit like they are.

We could try to develop a random model, creating odd numbers at random with the same sort of density as the primes, and examine the structure of the number of representations.  We could then compare the qualitative features of the results with the Goldbach data, both pre- and post-correction factor.


How to design this experiment?

For each odd integer $n$ from 3 up to $N$, toss a biased coin to determine whether it's a pseudo-prime.

The probability that $n$ is a pseudo-prime will be $2/\log(n)$.

Then run the Goldbach code using pseudo-primes instead of primes and see what happens. 

This morning, we'll investigate the claim that $\Li(x) \sim x/\log(x)$ and that $\pi(x)\sim \Li(x)$. 

prime_pi(17) 
       
7
7
A=plot(prime_pi(x),x,2,10000) B=plot(.125*x,x,2,10000,color='red') show(A+B) 
       
A=plot(Li(x),x,2,10000) B=plot(.125*x,x,2,10000,color='red') show(A+B) 
       
A=plot(x/log(x),2,10000) B=plot(.125*x,x,2,10000,color='red') show(A+B) 
       

We see that all three functions, $\pi(x)$, $Li(x)$ and $x/\log(x)$ exhibit the same qualitative behaviour: starting out above the line $.125 x$ and trending lower than it.  However, $x/\log(x)$ crosses the straight line much earlier than pi(x) does, and that crosses it earlier than $\Li(x)$ does.

A=plot(prime_pi(x),x,2,10^5) B=plot(Li(x),x,2,10^5,color='red') C=plot(x/log(x),x,2,10^5,color='green') show(B+A+C) 
       

At this stage, we have demonstrated experimentally (i.e. by plotting the functions!) that $\pi(x)$ and $Li(x)$ behave very similarly, and are *qualitatively* similar to $x/\log(x)$, though the latter grows more slowly.  Thus we often say that $\pi(x)$ grows like $x/\log(x)$, even though $Li(x)$ is a much better approximation.

A=plot(prime_pi(x),x,2,1000000000) B=plot(Li(x),x,2,1000000000,color='red') C=plot(x/log(x),x,2,1000000000,color='green') show(B+A+C) 
       
plot(x/log(x)/prime_pi(x),x,10000,100000) 
       

Plotting  $\frac{x}{\log(x)\pi(x)}$ we see that it appears to approach 1.  This is saying *precisely* that $\pi(x)\sim x/\log(x)$ are asymptotic.  Let's compare $Li(x)$ to $x/log(x)$.

plot(x/log(x)/Li(x),x, 100,1000) 
       

Again, visually we see that it appears that, while $Li(x)$ is usually greater than $x/\log(x)$, it appears that the ratio tends to 1, that is $Li(x)\sim x/\log(x)$.

plot(Li(x)/prime_pi(x), x, 20000, 100000000) 
       
plot(Li(x)-prime_pi(x),x,20000, 10^10) 
       

Let's describe what we see, both qualitatively and quantitatively.  At $10^9$ the difference is about 2500, whereas $\pi(x)$ is around $5\times 10^7$.

plot((Li(x)-prime_pi(x))/sqrt(x),x,20000, 10^11) 
       

The Riemann Hypothesis: conjectured over 150 years ago by Bernhardt Riemann, it can take many forms: one equivalent formulation is (roughly) the following: for every $\epsilon>0$,

\[ |Li(x) - \pi(x)| = O(x^{1/2 + \epsilon})  \]

meaning essentially that the difference between $Li(x)$ and $\pi(x)$ can't be too much bigger than $x^{1/2}$.

In our experiments with $Li(x)$ and $\pi(x)$ it appears that $Li(x)$ is bigger than $\pi(x)$.  This shows how deceptive number theory can be --- Littlewood showed a century ago that the difference $\Li(x)-\pi(x)$ changes sign infinitely often!  But the first change of sign occurs so far out that we can't compute when it first happens!

Regarding the Riemann Hypothesis, it has been tested extensively, and all the calculations suggest that it is true.

But!  Andrew Odlyzko, who has done more calculations than any other person, thinks we haven't computed anywhere near far enough to have convincing evidence!

RH is one of 7 problems listed in 2000 as "Millennium Challenge" problems, each with a \$1,000,000 prize attached.   So far, one, the Poincare Conjecture, has been resolved.

What does the Prime Number Theorem say in the context of the Goldbach Conjecture and random models of primes?  Can we build a random model and ask "what is the number of representations of $n$ as a sum of two pseudo-primes?"

We know the small primes (bigger than 3).  They are all either 1 or 5 (mod 6).  One way to model primes is to let each number congruent to $\pm 1$ (mod 6) be a pseudo-prime with an appropriate probability.  It turns out that there are problems with this.  So, to determine the how many representations we expect to see, perhaps we could do the following: 

Take the primes less than $n/2$ as given, and then for each $p<n/2$ let $n-p$ be a pseudo-prime with probability about $1/\log(n)$.  This isn't quite right.  There are about $n/2\log(n)$ primes in $[n/2,n]$, and they are all of the form $\pm 1$ (mod 6): there are about $n/6$ in $[n/2, n]$ which are $\pm 1$ (mod 6).  So, if we pick one at random, the probability that it is a prime is about $1/3\log(n)$.

So our model will be the following: for each prime $p$ with $3<p<n/2$, choose $n-p$ to be a pseudo-prime with probability $3/\log(n)$.  How many prime/pseudo-prime pairs are there summing to $n$?

Modeling the Goldbach conjecture with pseudo-primes

We will build a pseudo-prime model as follows.  For each $n$, we'll take the primes $p, 3<p<n/2$ as given, and for each $n-p$, we'll choose them to be pseudo-primes with appropriate probabilities.  Note, all primes will thus be congruent to $\pm 1 $ (mod 6), and so we'll rule some values of $n-p$ out as potential pseudo-primes if they are not also $\pm 1 $ (mod 6).

How many primes are there in $(3,n/2)$?  About $\pi(n/2)$  (actually $\pi(n/2-1)-2$, but who's counting?)

What are the appropriate probabilities? There are $\pi(n)-\pi(n/2)$ primes in $(n/2,n]$.  One sixth of the numbers in the interval are 1 (mod 6), and one sixth are 5 (mod 6), so one third are $\pm 1 $ (mod 6): there are $n/2$ numbers in the interval, so there are about $n/6$ integers in the interval which are $\pm 1$ (mod 6).

There are $\pi(n)-\pi(n/2)$ primes among the $n/6$ integers congruent to $\pm 1$ (mod 6).  So the probability that a randomly chosen integer congruent to $\pm 1$ (mod 6) is prime is about 

\[  \frac{\pi(n)-\pi(n/2)}{n/6} \]

If we use $\pi(x) \sim x/log(x)$, we see that this probability is about 

\[ \frac{n/\log(n) - n/2\log(n/2)}{n/6}\]

\[ \simeq \frac{6}{\log(n)} - \frac{3}{\log(n/2)} \]

\[ \simeq \frac{3}{\log(n)} \]

Hence when $n \equiv 0$ (mod 6) we expect to see about $\pi(n/2) . \frac{3}{\log(n)} \simeq \frac{3n}{2 (\log(n))^2}$ prime-pseudoprime pairs.

When $n \equiv $ 2 or 4 (mod 6), we only have half as many potential primes, so we expect to see about half as many representations, 

\[ \frac{3n}{4 (\log(n))^2} .\]

We see that considering the value of $n$ (mod 6) changes the number of representations.  Presumably the other correction factors will have similar smaller effects.  Let's just compare the plot of corrected representations to what we are seeing.  Note that in the above, we are counting only prime-pseudoprime pairs, whereas the corrected plot counted both $p+q=n$ and $q+p=n$. So, we need to double the estimate for things.  So, we'll compare corrected representations to 

\[  \frac{3n}{2 (\log(n))^2} \]

corrected_plot=list_plot(corrected_gr) approx_plot=plot(3*x/(2*log(x)^2),x,2,N) show(corrected_plot+approx_plot) 
       

So, it appears that this toy model does a good job of doing what we want --- approximating the number of Goldbach representations rather well.  Recall that $x/\log(x)$ undercounts $\pi(x)$, and so $1/\log(x)$ underestimates the probability of pseudo-primes.  Perhaps if we were to replace $x/ \log(x)$ by $Li(x)$, we would be able to do better?

A related conjecture: the Twin Prime Conjecture: there are infinitely many pairs of primes $p,q$ so that $q-p=2$.

It is certainly *true*.  But as of 2021 we don't know how to prove it.

Why should it be true?   First observe that if $q-p=2$ and $p, q>4$ are primes, then there is an $n$ so that $p=6n-1, q=6n+1$.  Now let's make a random model, and create pseudoprimes by choosing numbers congruent to $\pm 1 $ (mod 6), and making them a pseudoprime with an appropriate probability.

The number of primes up to $6n+2$ is about $6n/\log(6n)\simeq 6n/\log(n)$.  The number of potential pseudoprimes up to $6n+2$ is $2n$.  So the probability a particular $6n\pm 1$ is prime is about 

\[ \frac{\mbox{number of primes up to $6n+2$}}{\mbox{number of potential pseudoprimes}} \]

\[ \simeq  \frac{6n}{2n\log(n)} =\frac{3}{\log(n)} \]

So, the probability a particular $n$ has two pseudoprimes $6n-1, 6n+1$ should be about 

\[ \frac{9}{\log(n)^2} \]

Now, if we let X be the number of prime pairs up to $N$, if we let $X_n$ denote the event that $6n-1, 6n+1 $ form a prime pair, then 

\[  X = \sum_{n = 2}^N  X_n \]

and 

\[ E(X) = \sum_{n = 2}^N \frac{9}{\log(n)^2 } \]


Then as $N \rightarrow \infty$, $E(X) \rightarrow \infty$.  Indeed, for fixed $N$ we expect to see about 

\[ \frac{9N}{\log(N)^2}  \]

twin pseudoprimes in this model.

Let's combine the Goldbach and the Twin Prime conjectures.  We'll call $n$ a twin-prime core if $6n \pm 1$ forms a twin prime pair.  Which integers are sums of twin prime cores?  How many representations as sums of twin prime cores are there?  Note: $6n$ is a sum of twin primes $6k\pm 1$ and $6l \mp 1$ is the same as $n=k+l$ is a sum of twin-prime cores. 

We can perform a similar investigation to the pseudoprime investigation of Goldbach for this question.

Conjecture: There is an $n_0$ so that for all $n > n_0$, $n$ can be represented as a sum of twin-prime cores.

Conjecture: There is a TPG correction factor akin to the Hardy-Littlewood correction factor $HL(n)$ that we discussed above, playing an analogous role.

Conjecture:  The growth rate of twin-prime core representations grows like the number of twin-pseudoprime core representations.

Some questions to consider computationally.3

1) We have a probabilistic heuristic for the number of prime pairs we might expect to see.  Compute how many prime pairs there are up to $x$ as a function of $x$ and compare it to the heuristic.  If they are not similar, can we adjust the heuristic?

2) Is there a role for some sort of HL correction factor? (Hint: Hardy and Littlewood thought so!)

3) We talked about champions in the context of the Goldbach conjecture.  How many champions does the heuristic suggest we should see?  How does this compare to reality?

4) The idea of champions extends to the HL-corrected number of representations: how many champions do we see up to $x$ (as a function of $x$)?  Which numbers are champions?  Do they have any remaining structure to find?

5) We could take the upper convex hull of the plot of corrected Goldbach representations.  The vertices would all be champions, but not champions will be vertices.  How many vertices do we see (again, as a function of $x$)?

6) We'll define a runt to be a value $n$ so that corrected_gr($n$) is less than all subsequent values of corrected_gr($m$) for all $m>n$.  Notice that this implies that the number of representations along the sequence of runts is strictly increasing.  Clearly, we can't compute runts, since we don't know infinitely many values of corrected_gr($n$), but we can compute runts (as they appear from current data).

    

For the sequence of champions we were able to identify a subsequence, the sequence of vertices of the convex hull: it isn't immediately clear how to do something similar for runts.  Can we think of a way to do this?

One possibility is the following.  If we were to transform out data, by taking an appropriate function $f()$ and plotting f(corrected_gr($n$)) against $n$, we might be able to get a convex hull.  For example, if we were to square the number of representations, we might be able to find the lower convex hull.  Indeed, by varying the function $f$ we might be able to vary which points are f-convex-champions for various $f$ and which points are f-convex-runts for (other) various functions $f$.  

7) What is the spread of corrected_gr() in the vicinity of $n$?  How does this compare to the predicted spread given by the pseudoprime model?  How do we measure the spread of corrected_gr()? How big an interval around $n$ to we need to consider in order to see the bulk of the variation in corrected_gr($n\pm k$)?  Can we answer this for the pseudoprime model?

 
       
len(Goldbach_reps) 
       
10000
10000
list_plot(map(sqrt, Goldbach_reps)) # creates a list consisting of the square roots of the entries of the original list 
       

So, we could compare the vertices of the convex hull for this list compared to the original.

Our lists, unfortunately, have pairs of points.  We can't take the square root of [14,8].

def f(x): return([x[0], x[1]^2]) 
       
f([1,5]) 
       
[1, 25]
[1, 25]
list_plot(map(f,corrected_gr)[:]) 
       

Computationally, how would we compute the (upper or lower) convex hull of the transformed set of points?

If we have already computed the champions or the runts, the appropriate convex hull vertices will be a subset of these, so it is at worst quadratic in the number of champions or runts.

If we are computing champions, it's linear in $N$, and if we compute runts from the top down, it is likewise linear.

Appropriately important questions then: how many runts and champions are there $\le N$?

Description of experiment: fix $N$.  Compute the set of pairs $[n,corrected_gr(n)]$ which are champions (or runts).  From that, compute the vertices of the $f$-convex hull of the points for suitably chosen monotonic functions $f$.  

For champions, are there any interesting divisibilities still around?

If we consider $X(N)= $ number of runts + number of champions $\le N$, is $X(N)$ bounded by $\pi(N)$

For the uncorrected number of representations, we "know" experimentally that the champions are typically divisible by lots of small primes, and the runts are typically twice a big prime or product of big primes.  So, for uncorrected champions and runts we can probably say a lot.

Can we say anything at all for corrected champions and runts?

Can we introduce new runts in corrected data?  If so, how many.  Clearly, old runts, if they are twice a product of big primes, don't get corrected very much at all.  Compare pre-runts and post-runts?

Perhaps the convex hull considerations work better for uncorrected data than for corrected?  For convex hull champions, do we see multiples of 3, then 15, then 105, then etc for the convex hull of champions?


def sqrtmap(x): if x[1]==0: return([x[0],0]) else: return([x[0],sqrt(x[1])]) 
       
def g(x): if x[1]==0: return([x[0],0]) else: return([x[0],log(x[1])]) 
       
list_plot(map(sqrtmap,corrected_gr)[:1000]) 
       
list_plot(map(g,corrected_gr)[:1000]) 
       
map(g,corrected_gr)[:20] 
       
[[2, -Infinity],
 [4, 0],
 [6, log(1/2)],
 [8, log(2)],
 [10, log(9/4)],
 [12, 0],
 [14, log(5/2)],
 [16, log(4)],
 [18, log(2)],
 [20, log(3)],
 [22, log(9/2)],
 [24, log(3)],
 [26, log(55/12)],
 [28, log(10/3)],
 [30, log(9/4)],
 [32, log(4)],
 [34, log(105/16)],
 [36, log(4)],
 [38, log(17/6)],
 [40, log(9/2)]]
[[2, -Infinity],
 [4, 0],
 [6, log(1/2)],
 [8, log(2)],
 [10, log(9/4)],
 [12, 0],
 [14, log(5/2)],
 [16, log(4)],
 [18, log(2)],
 [20, log(3)],
 [22, log(9/2)],
 [24, log(3)],
 [26, log(55/12)],
 [28, log(10/3)],
 [30, log(9/4)],
 [32, log(4)],
 [34, log(105/16)],
 [36, log(4)],
 [38, log(17/6)],
 [40, log(9/2)]]

We are considering mapping functions onto the values of corrected_gr, the number of representations of 2n as a sum of two primes, corrected by the HL correction factor.  What sort of functions should map?

corrected_gr[:10] 
       
[[2, 0],
 [4, 1],
 [6, 1/2],
 [8, 2],
 [10, 9/4],
 [12, 1],
 [14, 5/2],
 [16, 4],
 [18, 2],
 [20, 3]]
[[2, 0],
 [4, 1],
 [6, 1/2],
 [8, 2],
 [10, 9/4],
 [12, 1],
 [14, 5/2],
 [16, 4],
 [18, 2],
 [20, 3]]
def oneoverx(x): if x[1]==0: return([x[0],0]) else: return([x[0],1/(x[1])]) 
       
list_plot(map(oneoverx,corrected_gr)[100:]) 
       

If we consider the map $x \rightarrow x^\alpha$, as we change $\alpha$ we may lose or gain champions in the vertices of the convex hull.  For each champion, if there is an $\alpha$ where it changes status, assign $n$ that value of $\alpha$.  What can we notice?

If $\alpha>1$ then our runts will have a convex hull of interest; as we increase $\alpha$ from 1, do we gain or lose points in the vertices of the convex hull?

If $\alpha\leq 1$, then our champions will have a convex hull of interest: as we decrease $\alpha$ from 1, do we gain or lose vertices in the convex hull?

If a point is dropped from the convex hull, is it possible that it will be reintroduced later? 

 
       

If we plot both champions and runts on the same graph, can we use the graphs to estimate the width of the spread?

Algorithms for finding convex hulls in 2 dimensions.  The easiest good algorithm seems to be Graham's algorithm 

It is clear that the upper convex hull of representations is formed by a subset of the champions: so even for lots of points $N$, if we have relatively few champions, we can compute the convex hull quickly.

Similarly for runts, but we need to transform the data by squaring (or some other suitable increasing function) so that the runts have a lower convex hull.  Again, the vertices of the lower convex hull will be a subset of the runts.

Homework: 

1.  Compute the list of champions and runts for the uncorrected number of Goldbach representations.

2.  Compute the vertices of the upper convex hull of the champions, and the lower convex hull of the squared-transformed data runts.  How many vertices are there relative to the number of champions/runts respectively?

3.  Consider the factorizations of the n-values of the vertices of the convex hulls.  Do the champions get progressively more small primes?  Do the vertices of the runts show any pattern you can discern?

4.  Do all the same things for the corrected data.

5. Can we use the vertices of the convex hulls of the corrected data to produce envelopes which enclose the entire set of corrected data, but meet it at interesting points.  Can we use this to measure the spread of the data?

Compare to random models?   

Note: compute the runts from the top of the data back to the beginning, whereas champions are computed from the bottom up.