2022.02.07 MATH 3600 Goldbach

108 days ago by calkin

The Goldbach Conjecture states: for every even number $2n\geq 4$, $2n$ is the sum of two (not necessarily distinct) primes.  For example, 4=2+2, 6=3+3, 8=3+5, etc.

Conjectured in a letter from Christian Goldbach in a letter to Leonard Euler. 

How might we investigate this conjecture?


Donovan's suggestion: for each even number, test combinations of primes to see if they add to that number. Let's do this by hand, and think about how to code it.


n=10: 

Primes less than 10 are: 2, 3, 5, 7.

Loop through the list: 

a=2: b=2 gives 4, b=3 gives 5, b=5 gives 7, b=7 gives 9.  a=2 fails.

a=3: b=2 gives 5, b=3 gives 6, b=5 gives 8, b=7 gives 10.  a=3,b=7 works!

a=5: b=2 gives 7, b=3 gives 8, b=5 gives 10, b=7 gives 12a=5, b=5 works!

a=7: b=2 gives 9, b=3 gives 10, b=5 gives 12, b=7 gives 14.  a=7, b=3 works!

Modify Goldbach's conjecture: for every even number $2n\geq 6$, $2n$ is the sum of two odd primes.

# Create a list prime_list of odd primes up to N N=30 prime_list=[] for i in srange(3,N): if is_prime(i): prime_list.append(i) 
       
print(prime_list) 
       
[3, 5, 7, 11, 13, 17, 19, 23, 29]
[3, 5, 7, 11, 13, 17, 19, 23, 29]
even_nums=[] for p in prime_list: for q in prime_list: even_nums.append(p+q) 
       
print(even_nums) 
       
[6, 8, 10, 14, 16, 20, 22, 26, 32, 8, 10, 12, 16, 18, 22, 24, 28, 34,
10, 12, 14, 18, 20, 24, 26, 30, 36, 14, 16, 18, 22, 24, 28, 30, 34, 40,
16, 18, 20, 24, 26, 30, 32, 36, 42, 20, 22, 24, 28, 30, 34, 36, 40, 46,
22, 24, 26, 30, 32, 36, 38, 42, 48, 26, 28, 30, 34, 36, 40, 42, 46, 52,
32, 34, 36, 40, 42, 46, 48, 52, 58]
[6, 8, 10, 14, 16, 20, 22, 26, 32, 8, 10, 12, 16, 18, 22, 24, 28, 34, 10, 12, 14, 18, 20, 24, 26, 30, 36, 14, 16, 18, 22, 24, 28, 30, 34, 40, 16, 18, 20, 24, 26, 30, 32, 36, 42, 20, 22, 24, 28, 30, 34, 36, 40, 46, 22, 24, 26, 30, 32, 36, 38, 42, 48, 26, 28, 30, 34, 36, 40, 42, 46, 52, 32, 34, 36, 40, 42, 46, 48, 52, 58]
even_nums_seen={} even_nums_in_order=[] for p in prime_list: for q in prime_list: if even_nums_seen.has_key(p+q)==False: even_nums_in_order.append(p+q) even_nums_seen[p+q]=True 
       
print(even_nums_seen) 
       
{6: True, 8: True, 10: True, 12: True, 14: True, 16: True, 18: True, 20:
True, 22: True, 24: True, 26: True, 28: True, 30: True, 32: True, 34:
True, 36: True, 38: True, 40: True, 42: True, 46: True, 48: True, 52:
True, 58: True}
{6: True, 8: True, 10: True, 12: True, 14: True, 16: True, 18: True, 20: True, 22: True, 24: True, 26: True, 28: True, 30: True, 32: True, 34: True, 36: True, 38: True, 40: True, 42: True, 46: True, 48: True, 52: True, 58: True}
print(even_nums_in_order) 
       
[6, 8, 10, 14, 16, 20, 22, 26, 32, 12, 18, 24, 28, 34, 30, 36, 40, 42,
46, 38, 48, 52, 58]
[6, 8, 10, 14, 16, 20, 22, 26, 32, 12, 18, 24, 28, 34, 30, 36, 40, 42, 46, 38, 48, 52, 58]

Let us create a dictionary listing how often even numbers are the sum of two primes and plot the results, to see if there are any interesting things to observe.  We'll call the dictionary gr, for Goldbach representations.

Note: gr[n] will denote the number of prime pairs $p,q$ for which $p+q=n$.  Since we are excluding 2 from our list of primes, $n$ will always be even.

gr={} # Create a list prime_list of odd primes up to N N=20000 prime_list=[] for i in srange(3,N): if is_prime(i): prime_list.append(i) for p in prime_list: for q in prime_list: if gr.has_key(p+q)==False: gr[p+q]=1 else: gr[p+q]+=1 
       
#print(gr) 
       
list_plot(gr) 
       

Clearly the decay in the second half of this picture is because some of the missing representations in that range need primes bigger than $N$.  Let's restrict the plot to the first half.

gr={} # Create a list prime_list of odd primes up to N N=100000 prime_list=[] for i in srange(3,N): if is_prime(i): prime_list.append(i) for p in prime_list: for q in prime_list: if p+q>N: break if gr.has_key(p+q)==False: gr[p+q]=1 else: gr[p+q]+=1 list_plot(gr) 
       

This image has some very noticeable structure.  What do you see?

Bands or striations where the points are dense, and others where the points are sparse.

Can we find anything that appears to correlate to high or low values of gr[n]?

list_plot([ [k,gr[k]] for k in gr.keys()[:50]]) 
       

Let's look at the numbers instead of the picture.  Here are the first fifty pairs $n,gr[n]$.

print([ [k,gr[k]] for k in gr.keys()[:50]]) 
       
[[6, 1], [8, 2], [10, 3], [12, 2], [14, 3], [16, 4], [18, 4], [20, 4],
[22, 5], [24, 6], [26, 5], [28, 4], [30, 6], [32, 4], [34, 7], [36, 8],
[38, 3], [40, 6], [42, 8], [44, 6], [46, 7], [48, 10], [50, 8], [52, 6],
[54, 10], [56, 6], [58, 7], [60, 12], [62, 5], [64, 10], [66, 12], [68,
4], [70, 10], [72, 12], [74, 9], [76, 10], [78, 14], [80, 8], [82, 9],
[84, 16], [86, 9], [88, 8], [90, 18], [92, 8], [94, 9], [96, 14], [98,
6], [100, 12], [102, 16], [104, 10]]
[[6, 1], [8, 2], [10, 3], [12, 2], [14, 3], [16, 4], [18, 4], [20, 4], [22, 5], [24, 6], [26, 5], [28, 4], [30, 6], [32, 4], [34, 7], [36, 8], [38, 3], [40, 6], [42, 8], [44, 6], [46, 7], [48, 10], [50, 8], [52, 6], [54, 10], [56, 6], [58, 7], [60, 12], [62, 5], [64, 10], [66, 12], [68, 4], [70, 10], [72, 12], [74, 9], [76, 10], [78, 14], [80, 8], [82, 9], [84, 16], [86, 9], [88, 8], [90, 18], [92, 8], [94, 9], [96, 14], [98, 6], [100, 12], [102, 16], [104, 10]]

It appears that the largest values of gr[n] appear to correspond to n which are divisible by 3.  

Let's plot three pictures, according to n (mod 3), and see what we see.

gr_0=[] # list of the pairs n,gr(n) for which n is 0 mod 3 gr_1=[] # list of the pairs n,gr(n) for which n is 1 mod 3 gr_2=[] # list of the pairs n,gr(n) for which n is 2 mod 3 for key in gr: if mod(key,3)==0: gr_0.append([key,gr[key]]) elif mod(key,3)==1: gr_1.append([key,gr[key]]) elif mod(key,3)==2: gr_2.append([key,gr[key]]) A=list_plot(gr_0, color='blue') B=list_plot(gr_1, color='green') C=list_plot(gr_2, color='red') show(A+B+C) #show(C+A+B) 
       

Here we observe that the blue points are much higher than the other points.  Divisibility by 3 appears to give n about twice as many representations as a sum of two primes.  Let's take that into account by dividing gr(n) by 3 when n is divisible by 3.

gr_0=[] # list of the pairs n,gr(n) for which n is 0 mod 3 gr_1=[] # list of the pairs n,gr(n) for which n is 1 mod 3 gr_2=[] # list of the pairs n,gr(n) for which n is 2 mod 3 for key in gr: if mod(key,3)==0: gr_0.append([key,gr[key]/2]) elif mod(key,3)==1: gr_1.append([key,gr[key]]) elif mod(key,3)==2: gr_2.append([key,gr[key]]) 
       
A=list_plot(gr_0, color='blue') B=list_plot(gr_1, color='green') C=list_plot(gr_2, color='red') show(A+B+C) 
       

It does appear that the correction factor of 2 is correct.  We've "explained away" all the difference between the blue and the other points.

Let's try something similar for divisibility by 5.

gr_0=[] # list of the pairs n,gr(n) for which n is 0 mod 5 gr_1=[] # list of the pairs n,gr(n) for which n is 1 mod 5 gr_2=[] # list of the pairs n,gr(n) for which n is 2 mod 5 gr_3=[] # list of the pairs n,gr(n) for which n is 3 mod 5 gr_4=[] # list of the pairs n,gr(n) for which n is 4 mod 5 for key in gr: if mod(key,3)==0: if mod(key,5)==0: gr_0.append([key,gr[key]/2]) elif mod(key,5)==1: gr_1.append([key,gr[key]/2]) elif mod(key,5)==2: gr_2.append([key,gr[key]/2]) elif mod(key,5)==2: gr_2.append([key,gr[key]/2]) elif mod(key,5)==2: gr_2.append([key,gr[key]/2]) else: if mod(key,5)==0: gr_0.append([key,gr[key]]) elif mod(key,5)==1: gr_1.append([key,gr[key]]) elif mod(key,5)==2: gr_2.append([key,gr[key]]) elif mod(key,5)==2: gr_2.append([key,gr[key]]) elif mod(key,5)==2: gr_2.append([key,gr[key]]) 
       
A=list_plot(gr_0, color='blue') B=list_plot(gr_1, color='green') C=list_plot(gr_2, color='red') D=list_plot(gr_3, color='yellow') E=list_plot(gr_4, color='cyan') show(A+B+C+D+E) 
       

It appears that the blue values are about 1.5 times the remaining values: let's try dividing the data by 1.5 when n is divisible by 5.

gr_0=[] # list of the pairs n,gr(n) for which n is 0 mod 5 gr_1=[] # list of the pairs n,gr(n) for which n is 1 mod 5 gr_2=[] # list of the pairs n,gr(n) for which n is 2 mod 5 gr_3=[] # list of the pairs n,gr(n) for which n is 3 mod 5 gr_4=[] # list of the pairs n,gr(n) for which n is 4 mod 5 for key in gr: if mod(key,3)==0: if mod(key,5)==0: gr_0.append([key,gr[key]/2/(3/2)]) elif mod(key,5)==1: gr_1.append([key,gr[key]/2]) elif mod(key,5)==2: gr_2.append([key,gr[key]/2]) elif mod(key,5)==2: gr_2.append([key,gr[key]/2]) elif mod(key,5)==2: gr_2.append([key,gr[key]/2]) else: if mod(key,5)==0: gr_0.append([key,gr[key]/(3/2)]) elif mod(key,5)==1: gr_1.append([key,gr[key]]) elif mod(key,5)==2: gr_2.append([key,gr[key]]) elif mod(key,5)==2: gr_2.append([key,gr[key]]) elif mod(key,5)==2: gr_2.append([key,gr[key]]) 
       
A=list_plot(gr_0, color='blue') B=list_plot(gr_1, color='green') C=list_plot(gr_2, color='red') D=list_plot(gr_3, color='yellow') E=list_plot(gr_4, color='cyan') show(A+B+C+D+E) 
       

Now it appears that the blue plot is too low.  1.5 was too high a correction factor.  Let's try a smaller correction factor.  Note, as a rational number, 1.5=3/2.  Let us try Vincent's suggestion of 4/3.

gr_0=[] # list of the pairs n,gr(n) for which n is 0 mod 5 gr_1=[] # list of the pairs n,gr(n) for which n is 1 mod 5 gr_2=[] # list of the pairs n,gr(n) for which n is 2 mod 5 gr_3=[] # list of the pairs n,gr(n) for which n is 3 mod 5 gr_4=[] # list of the pairs n,gr(n) for which n is 4 mod 5 for key in gr: if mod(key,3)==0: if mod(key,5)==0: gr_0.append([key,gr[key]/2/(4/3)]) elif mod(key,5)==1: gr_1.append([key,gr[key]/2]) elif mod(key,5)==2: gr_2.append([key,gr[key]/2]) elif mod(key,5)==2: gr_2.append([key,gr[key]/2]) elif mod(key,5)==2: gr_2.append([key,gr[key]/2]) else: if mod(key,5)==0: gr_0.append([key,gr[key]/(4/3)]) elif mod(key,5)==1: gr_1.append([key,gr[key]]) elif mod(key,5)==2: gr_2.append([key,gr[key]]) elif mod(key,5)==2: gr_2.append([key,gr[key]]) elif mod(key,5)==2: gr_2.append([key,gr[key]]) A=list_plot(gr_0, color='blue') B=list_plot(gr_1, color='green') C=list_plot(gr_2, color='red') D=list_plot(gr_3, color='yellow') E=list_plot(gr_4, color='cyan') show(A+B+C+D+E) 
       

Let's discuss where we are.  We've seen that for even n, it appears that we get growing numbers of representations. How fast is the growth rate?  We haven't investigated that yet!  We should put this question on the list of questions to explore!

We've seen that divisibility by 3 leads to about 2 times as many representations.

We've seen that divisibility by 5 leads to about 4/3 times as many representations.

What do we see for divisibility by 7?  

It appears that for 3 and 5, all that mattered was divisibility by 3 or by 5.  There doesn't appear to be a visible difference for other residue classes mod 3 or mod 5.  


Now we want to 

a) correct for divisibility by 3

b) correct for divisibility by 5

c) separate out two plots: those divisible by 7 and those not, and plot both on the same axes in different colors.

gr7_zero=[] # list of the pairs n,gr(n) for which n is divisible by 7 gr7_non_zero=[] # list of the pairs n,gr(n) for which n is not divisible by 7 for key in gr: if mod(key,7)==0: a=gr[key] if mod(key,3)==0: a=a/2 if mod(key,5)==0: a=a/(3/2) gr7_zero.append([key,a]) else: a=gr[key] if mod(key,3)==0: a=a/2 if mod(key,5)==0: a=a/(3/2) gr7_non_zero.append([key,a]) A=list_plot(gr7_zero, color='blue') B=list_plot(gr7_non_zero, color='green') show(A+B) show(B+A) 
       

From these pictures it is apparent that divisibility by 7 has a similar sort of impact, though smaller, to that of divisibility by 3 or by 5.  

Jack's conjecture: correct by a factor of $\frac{p-1}{p-2}=\frac{6}{5}$.

gr7_zero=[] # list of the pairs n,gr(n) for which n is divisible by 7 gr7_non_zero=[] # list of the pairs n,gr(n) for which n is not divisible by 7 for key in gr: if mod(key,7)==0: a=gr[key]/(6/5) if mod(key,3)==0: a=a/2 if mod(key,5)==0: a=a/(3/2) gr7_zero.append([key,a]) else: a=gr[key] if mod(key,3)==0: a=a/2 if mod(key,5)==0: a=a/(3/2) gr7_non_zero.append([key,a]) A=list_plot(gr7_zero, color='blue') B=list_plot(gr7_non_zero, color='green') show(A+B) show(B+A) 
       

This looks like it might be good.  If Jack's conjecture is correct, then we might expect a correction factor of $\frac{p-1}{p-2}$ for each odd prime dividing $n$.  (Notice that one interpretation of this is that the difference between divisibility by 2 and non-divisibility by 2 should be infinite, i.e. that we get no representations when $n$ is not divisibly by 2 --- which is the case!)

Where to take this next? 

1) Create a function $HL(n)$ which is the correction factor for $n$ taking into account all odd primes dividing $n$.

(named in honor of G. H. Hardy and John E. Littlewood, who discovered it).

Plot the result -- does it get rid of all the striations we saw in the first few plots?

2) Investigate whether there are implications from divisibility by powers of primes?

3) See if we can explain why the factor $HL(n)$ works?  Can we explain?  Can we prove?

4) Once we correct, how fast do things grow? 

5) We can correct by dividing by $HL(n)$ or by multiplying non divisible numbers by it.  Does this make the growth rate look easier to describe?

def HL(n): prime_factor_list = (2*n).prime_factors()[1:] # obtain the odd prime factors of n hl=1 for p in prime_factor_list: # for each odd prime factor p of n, contribute (p-1)/(p-2) hl=hl*(p-1)/(p-2) return(hl) 
       
HL(7) 
       
6/5
6/5
list_plot(gr) 
       
gr_corrected=[] for k in gr.keys(): gr_corrected.append([k,gr[k]/HL(k)]) 
       
gr_corrected[:10] 
       
[[6, 1/2],
 [8, 2],
 [10, 9/4],
 [12, 1],
 [14, 5/2],
 [16, 4],
 [18, 2],
 [20, 3],
 [22, 9/2],
 [24, 3]]
[[6, 1/2],
 [8, 2],
 [10, 9/4],
 [12, 1],
 [14, 5/2],
 [16, 4],
 [18, 2],
 [20, 3],
 [22, 9/2],
 [24, 3]]
list_plot(gr_corrected) 
       

Heuristically we explain HL(n) by observing that for a given odd prime $p$, the sums of primes in $S_i+S_j$ have about $s^2$ values when $i\neq j$, and about $s(s+1)/2$ values when $i=j$.  If $n$ is divisible by $p$, then all the pairs adding to $n$ come from distinct pairs of sets: if $n$ is not divisible by $p$ then one of the ways of representing $n$ comes from summing the same set twice. 

How fast does the corrected $gr(n)/HL(n)$ grow?

It might help to know how many primes there are up to $x$.


Define $\pi(x) =$ the number of primes up to $x$.  Then $\pi(x) \simeq \frac{x}{\log(x)}$.