In a 1742 letter to Euler, Goldbach conjectured that every even integer greater than or equal to 4 was the sum of two primes.
This has become known as the Goldbach Conjecture.
In this worksheet we'll explore this conjecture.
Task 1: Fix N=1000 and create a list of all the primes less than N.
What is the largest prime on the list? How many primes are there in the list.

1229 1229 
9973 9973 
Suppose we want to check Goldbach's conjecture: how might we proceed?
Set up a list filled with 0's. Step through the prime_list and for each prime,
step through the prime_list again, and add the result to get m. Incremenment the value of
Goldbach_reps[m]




Observation: it appears that the number of representations of even n as a sum of two primes is growing.
It appears that there are several (at least two) striations. Can we identify either of them?
[[2, 0], [4, 1], [6, 1], [8, 2], [10, 3]] [[2, 0], [4, 1], [6, 1], [8, 2], [10, 3]] 
[[202, 17], [204, 28], [206, 13], [208, 14], [210, 38], [212, 12], [214, 15], [216, 26], [218, 13], [220, 18]] [[202, 17], [204, 28], [206, 13], [208, 14], [210, 38], [212, 12], [214, 15], [216, 26], [218, 13], [220, 18]] 
Call a value of n a "champion" if the number of representations is bigger than any value we've seen so far.
Compute a list of champions.
What are the factorizations of champions?


[[4, 1], [8, 2], [10, 3], [16, 4], [22, 5], [24, 6], [34, 7], [36, 8], [48, 10], [60, 12], [78, 14], [84, 16], [90, 18], [114, 20], [120, 24], [168, 26], [180, 28], [210, 38], [300, 42], [330, 48], [390, 54], [420, 60], [510, 64], [630, 82], [780, 88], [840, 102], [990, 104], [1050, 114], [1140, 116], [1260, 136], [1470, 146], [1650, 152], [1680, 166], [1890, 182], [2100, 194], [2310, 228], [2730, 256], [3150, 276], [3570, 308], [3990, 326], [4200, 330], [4410, 342], [4620, 380], [5250, 396], [5460, 436], [6090, 444], [6510, 482], [6930, 536], [7980, 548], [8190, 584]] [[4, 1], [8, 2], [10, 3], [16, 4], [22, 5], [24, 6], [34, 7], [36, 8], [48, 10], [60, 12], [78, 14], [84, 16], [90, 18], [114, 20], [120, 24], [168, 26], [180, 28], [210, 38], [300, 42], [330, 48], [390, 54], [420, 60], [510, 64], [630, 82], [780, 88], [840, 102], [990, 104], [1050, 114], [1140, 116], [1260, 136], [1470, 146], [1650, 152], [1680, 166], [1890, 182], [2100, 194], [2310, 228], [2730, 256], [3150, 276], [3570, 308], [3990, 326], [4200, 330], [4410, 342], [4620, 380], [5250, 396], [5460, 436], [6090, 444], [6510, 482], [6930, 536], [7980, 548], [8190, 584]] 
Today: we'll explore the importance of divisibility by 3 in Goldbach's conjecture.
First, let's take the data we've gathered, and create three plots:
Plot A: when n is divisible by 3, i.e. 0 (mod 3)
Plot B: when n is 1 (mod 3)
Plot C: when n is 2 (mod 3)
Suggested method: step through gr_evens and create three new lists:


Lucy's conjecture: factors matter. Let's test this: create lists according to whether n is divisible by exactly 3, 5, both or neither.
Call them gr_00, gr_10, gr_01, gr_11 where 0 means divisible, 1 means not, and the first position means 3, the second means 5



It appears that Lucy's conjecture that divisibility by 3 and 5 matters is confirmed.
Let's a) see if we can quantify the difference that it makes and
b) come up with (heuristic) reasons for the behavior.
Exercise: plot the number of representations of n, dividing by 2 when n is divisible by 3, and unchanged when n is not divisible by 3, in different colors, twice, once with each color on top.
Let's consider the divisibility by 3.
How many primes are there less than N? About $N/\log N$.
Now consider the primes (mod 3).
There's exactly one prime congruent to 0 (mod 3), namely 3.
Dirichlet proved that the remaining primes are distributed about equally into the other two residue classes..
Let's call the set of primes congruent to 1 (mod 6) $S_1$, and the set congruent to 5 (mod 6) $S_5$.
Notice that (with the exception of 2 and 3), all primes are either 1 or 5 (mod 6).
(Note that if we were considering divisibility by 5 instead we might have sets $S_1, S_3, S_7$ and $S_9$).
What can we say about even numbers that are sums of two primes?
If $n\equiv 0 (mod 3)$ then since $n$ is even, it is actually $0 (mod 6)$.
If $n\equiv 1 (mod 3)$ then $n\equiv 4 (mod 6)$.
If $n\equiv 2 (mod 3)$ then $n\equiv 2 (mod 6)$.
Now, considere how we can obtain $n$ as a sum of primes.
If $n \equiv 0 (mod \ 6)$, then if $n=p+q$, then one of $p,q$ is in $S_1$, and the other is in $S_5$.
If $n\equiv 2 (mod \ 6)$, then if $n=p+q$, both of $p,q$ are in $S_1$.
If $n \equiv 4 (mod \ 6)$ then if $n=p+p$, both of $p, q$ are in $S_5$.
What do we mean by a heuristic? It's a reason to believe something is true, a plausible explanation for a mechanism for why something happens, as opposed to a formal proof that something is true.
Suppose that $S_1\simeq S_5=s$.
How many ordered pairs $(p,q)$ are there where $p\in S_1$ and $q\in S_5$? About $s^2$.
How many ordered pairs $(p,q)$ are there where $p, q \in S_1$? About $s^2$.
How many ordered pairs $(p,q)$ are there where $p, q \in S_5$? About $s^2$.
Once we consider the ordering of the primes, we see that we get about $2s^2$ sums from $S_1 + S_5$,
but only $s^2$ from $S_1 +S_1$ and $s^2$ from $S_5 + S_5$.
Let's do an example: $S_1 = \{7, 13,19 \}$ and $S_5 = \{5, 11,17 \}$
$S_1+S_5 = \{7+5, 7+11, 7+17, 13 + 5, 13+11, 13+17, 19+5, 19+11, 19+17 \}$
and in our algorithm, each of these would get counted twice, once as 13+17, for example, and once as 17+13.
$S_1+S_1 = \{ 7+7, 7+13, 7+19, 13+7,13+13, 13+19, 19+7, 19+13, 19+19 \}$ but in this case, all of the sums have appeared in both orders!
So all the representations of even numbers congruent to $0 (mod\ 6)$ add up to $2s^2$, whereas all the representations of even numbers congruent $2 (mod \ 6)$ add up to $s^2$, and the same is true for $4 (mod \ 6)$.
Can we extend this idea to divisibility by 5?
Modulo 10, we have even residue classes 0, 2, 4, 6, 8.
Our primes (except for 2 and 5) all fall in residue classes 1, 3, 7, 9 (mod 10). Call these sets of primes
$S_1, S_3, S_7, S_9$.
How can we get 0 (mod 10) as a sum of two primes? $S_1+S_9, S_3+S_7$.
How can we get 2 (mod 10) as a sum of two primes? $S_1+S_1, S_2+S_9$.
How can we get 4 (mod 10) as a sum of two primes? $S_1+S_3, S_7+S_7$.
How can we get 6 (mod 10) as a sum of two primes? $S_3+S_3, S_7+S_9$.
How can we get 8 (mod 10) as a sum of two primes? $S_1+S_7, S_9+S_9$.
Just as with mod 3, when we add two distinct sets of size $s$, we get $2s^2$ representations, but when we add a set of size $s$ to itself, we get $s^2$ representations.
So, for 0 (mod 10), we get $4s^2$ representations, and for all the others, we get $3s^2$ representations: in other words, we get 4/3 times as many representations for even numbers divisible by 5, on average, than we do for numbers not divisible by 5.
Can we unify things, and calculate a correction factor for all odd primes? Note that $2=(31)/(32)$ and $4/3=(51)/(52)$. Can we show that the correction factor for a prime $p$ is $(p1)/(p2)$.
If we work modulo $p$, we have $p1$ residue classes (mod $2p$) which contain infinitely many primes. By Dirichlet's theorem they will all be about the same size. Note that there is no class $S_p$.
We get 0 (mod $2p$) by adding $S_i +S_{2pi}$: these are distinct classes. There are $(p1)/2$ choices for $i$.
We get every other residue class in two different ways: one as the sum of a class and itself, the other ways as sums of two distinct classes. There are $(p1)/21=(p3)/2$ ways to do this as a sum of two distinct classes.
So, when divisible by $p$, we get $(p1)2s^2/2=(p1)s^2$ representations.
When not divisble by $p$, we get $(p3)2s^2/2+s^2=(p2)s^2$.
Thus, on average, we get $(p1)/(p2)$ times as many representations for numbers divisible by $p$ as we do for numbers not divisible by $p$.
We will build a function $HL(n)$ which contains the correction factor for each prime dividing $n$. We'll call this the HardyLittlewood correction factor. It will be defined by
\[ HL(n) = \prod_{pn} \frac{p1}{p2} \]
where the product is over all odd primes $p$ dividing $n$.
Now, if we take the number of representations of $n$ and divide by this correction factor, we should hope to counter the effects of divisibility by odd primes on the number of representations.
Now the question is: given an integer $n$ how do we get the odd primes dividing it?



What to do now?


It appears from the plot that the correction factor brings all the striations together, explaining most of the variation in gr_evens.
Now let's ask the question "how fast should we expect the number of representations to grow?"
We'll need to think about how many primes there are up to $N$. The prime number theorem says
\[ \pi(x) \sim Li(x) \]
In this formula, $\pi(x)$ is the number of primes less than or equal to $x$,
\[ Li(x)=\int_2^x \frac{1}{\log(t)} dt \]
and $f(x)\sim g(x)$ means that
\[ \lim_{x \rightarrow \infty} \frac{f(x)}{g(x)} = 1 \]
Since $Li(x) \sim \frac{x}{\log(x)}$, we can use $x/\log(x)$ as an approximate growth rate for $\pi(x)$.
Primes are most definitely *not* random. But, in many ways they behave a little bit like they are.
We could try to develop a random model, creating odd numbers at random with the same sort of density as the primes, and examine the structure of the number of representations. We could then compare the qualitative features of the results with the Goldbach data, both pre and postcorrection factor.
How to design this experiment?
For each odd integer $n$ from 3 up to $N$, toss a biased coin to determine whether it's a pseudoprime.
The probability that $n$ is a pseudoprime will be $2/\log(n)$.
Then run the Goldbach code using pseudoprimes instead of primes and see what happens.

This morning, we'll investigate the claim that $\Li(x) \sim x/\log(x)$ and that $\pi(x)\sim \Li(x)$.
7 7 



We see that all three functions, $\pi(x)$, $Li(x)$ and $x/\log(x)$ exhibit the same qualitative behaviour: starting out above the line $.125 x$ and trending lower than it. However, $x/\log(x)$ crosses the straight line much earlier than pi(x) does, and that crosses it earlier than $\Li(x)$ does.

At this stage, we have demonstrated experimentally (i.e. by plotting the functions!) that $\pi(x)$ and $Li(x)$ behave very similarly, and are *qualitatively* similar to $x/\log(x)$, though the latter grows more slowly. Thus we often say that $\pi(x)$ grows like $x/\log(x)$, even though $Li(x)$ is a much better approximation.


Plotting $\frac{x}{\log(x)\pi(x)}$ we see that it appears to approach 1. This is saying *precisely* that $\pi(x)~x/\log(x)$ are asymptotic. Let's compare $Li(x)$ to $x/log(x)$.

Again, visually we see that it appears that, while $Li(x)$ is usually greater than $x/\log(x)$, it appears that the ratio tends to 1, that is $Li(x)\sim x/\log(x)$.


Let's describe what we see, both qualitatively and quantitatively. At $10^9$ the difference is about 2500, whereas $\pi(x)$ is around $5\times 10^7$.




The Riemann Hypothesis: conjectured over 150 years ago by Bernhardt Riemann, it can take many forms: one equivalent formulation is (roughly) the following: for every $\epsilon>0$,
\[ Li(x)  \pi(x) = O(x^{1/2 + \epsilon}) \]
meaning essentially that the difference between $Li(x)$ and $\pi(x)$ can't be too much bigger than $x^{1/2}$.
In our experiments with $Li(x)$ and $\pi(x)$ it appears that $Li(x)$ is bigger than $\pi(x)$. This shows how deceptive number theory can be  Littlewood showed a century ago that the difference $\Li(x)\pi(x)$ changes sign infinitely often! But the first change of sign occurs so far out that we can't compute when it first happens!
Regarding the Riemann Hypothesis, it has been tested extensively, and all the calculations suggest that it is true.
But! Andrew Odlyzko, who has done more calculations than any other person, thinks we haven't computed anywhere near far enough to have convincing evidence!
RH is one of 7 problems listed in 2000 as "Millennium Challenge" problems, each with a \$1,000,000 prize attached. So far, one, the Poincare Conjecture, has been resolved.
What does the Prime Number Theorem say in the context of the Goldbach Conjecture and random models of primes? Can we build a random model and ask "what is the number of representations of $n$ as a sum of two pseudoprimes?"
We know the small primes (bigger than 3). They are all either 1 or 5 (mod 6). One way to model primes is to let each number congruent to $\pm 1$ (mod 6) be a pseudoprime with an appropriate probability. It turns out that there are problems with this. So, to determine the how many representations we expect to see, perhaps we could do the following:
Take the primes less than $n/2$ as given, and then for each $p<n/2$ let $np$ be a pseudoprime with probability about $1/\log(n)$. This isn't quite right. There are about $n/2\log(n)$ primes in $[n/2,n]$, and they are all of the form $\pm 1$ (mod 6): there are about $n/6$ in $[n/2, n]$ which are $\pm 1$ (mod 6). So, if we pick one at random, the probability that it is a prime is about $1/3\log(n)$.
So our model will be the following: for each prime $p$ with $3<p<n/2$, choose $np$ to be a pseudoprime with probability $3/\log(n)$. How many prime/pseudoprime pairs are there summing to $n$?
Building a random model for primes to test Goldbach.
Let's see how to make randomness helpful.
In the interval $[1,n/2]$ we'll take the primes congruent to $\pm 1$ (mod 6).
In the interval $(n/2, n]$ we'll take the integers which are $\pm 1$ (mod 6) and decide randomly whether they are a pseudoprime.
There are about $\Li(n)\Li(n/2) \sim n/\log(n)  n/2\log(n/2) \sim n/2\log(n)$ primes in the interval $(n/2, n]$.
These are all contained in the $n/6$ numbers which are $\pm 1$ (mod 6). So the probability that a random integer congruent to $\pm 1$ (mod 6) is prime is about $(n/2\log(n))/(n/6) \sim 3 /\log(n)$.
Now for each even number $n$, we'll count how many primepseudoprime pairs we are likely to see.
If $n \equiv 0$ (mod 6), then for each prime $3<p < n/2$, $np$ is $\pm 1$ (mod 6), and so with probability $3/\log(n)$, the pair $p, np$ is a primepseudoprime pair summing to $n$.
If $n \equiv 2$ (mod 6), then for primes $p$ congruent to 1 (mod 6), $np$ is also 1 (mod 6) and can be a pseudoprime: but if $p$ is congruent to 5 (mod 6), np is $25 \equiv 3$ (mod 6) and hence *can't* be a pseudoprime.
If $n \equiv $ 4 (mod 6), then when $p \equiv 5$ (mod 6) then $np \equiv 1 \equiv 5 $ (mod 6), and so *can* be a pseudoprime, but if $p \equiv 1 $ (mod 6), then $np\equiv $ 3 (mod 6), and hence cannot be a pseudoprime.
So, let's estimate how many primepseudoprime pairs we expect to see for each even $n$.
If $n \equiv 0$ (mod 6): there are about $n/2\log(n)$ primes $p$ with $3<p<n/2$, and for each of them, $np$ is a pseudoprime with probability $3/\log(n)$. So, the expected number of primepseudoprime pairs adding to $n$ is about
\[
\frac{3n}{2(\log(n))^2}.
\]
So, the expected number of representations for $n$ divisible by 6 is about
\[ \frac{3n}{2 (\log(n))^2}. \]
If $n$ is 2 or 4 (mod 6), then we expect to see half that many representations, about
\[ \frac{3n}{4 (\log(n))^2} \]

We see that the random model for pseudoprimes looks like a reasonably good fit for the corrected number of representations. What happens if we use $Li(x)$ instead of $n/\log(n)$ instead?
Let's see if we can be a little bit more careful. The number of primes up to $n/2$ is about $Li(n/2)$. The number of primes between $n/2$ and $n$ is about $Li(n)Li(n/2)$. So the probability that $np$ is prime is equal to
\[ \frac{\mbox{number of primes between $n/2$ and $n$}}{\mbox{number of numbers between $n/2$ and $n$ congruent to $\pm 1$ mod 6}} \]
\[ = \frac{Li(n)Li(n/2)}{n/6} =\frac{6(Li(n)Li(n/2))}{n} \]
So the number of representations we expect to see is
\[ \frac{6(Li(n)Li(n/2))Li(n/2)}{n} \]
Now, we need to estimate the probability that a number in the range $[3,n]$ is prime as $Li(n)/n$ instead of $1/\log(n)$. So, everywhere we see $1/\log(n)$ we need to replace it by $Li(n)/n$.
So,
\[ \frac{3n}{4 (\log(n))^2} \]
will need to be replaced by
\[ \frac{3n Li(n)^2}{4n^2 } \sim \frac{3 Li(n)^2}{4n}\]

So, clearly Professor Calkin has messed something up! We will need to figure out what! Fortunately some reflection showed that the formula plotted had a $1/n$ in it, instead of a $1/x$, so it was computing the wrong thing!
In the corrected Goldbach representation graph, how big is the spread of the points? How big a spread would we expect to see if we have a random model? Let $X(n)$ be a random variable, giving the number of primepseudoprime pairs $p, np$ summing to $n$. This is a random variable given as a sum of indicator variables
\[ X(n) = \sum_{p < n/2} X_p \]
where $X_p$ is 1 if $np$ is a pseudoprime, and 0 otherwise. We can work out the variance of the variable $X$ as the sum of the variances of the (independent) variables $X_p$. Hence we can figure out the standard deviation we expect to see in $X$, that is the size of the spread of the plot of corrected Goldbach representations.
This has all been prologue. There is another famous conjecture, the "Twin Prime Conjecture", which states that there are infinitely many $m$ so that $m1$ and $m+1$ are both prime.
Why do we expect this conjecture to be true? Again, $m$ must be a multiple of 6, so we are saying infinitely many integers $n$ so that $6n\pm 1$ are both prime. The probability that a number differing from 6n by 1 is about
\[ \frac{\mbox{number of primes up to 6$n$}}{\mbox{number of numbers congruent to $\pm 1$ mod 6 up to 6$n$}} \]
\[ \simeq \frac{6n/\log(6n)}{2n} \simeq \frac{3}{\log(n)} \]
So the probability that both $6n1$ and $6n+1$ are prime should be about
\[ \frac{9}{\log(n)^2} \]
So we expect that the number of prime pairs that we see up to $6N+2$ will be around
\[ \sum_{n} \mbox{Pr($6n\pm 1$ are both prime)} \]
Since
\[ \sum_{n\geq 2} \frac{1}{\log(n)^2} \]
diverges, we expect to see infinitely many twin primes.
Let us combine the Goldbach and Twin Prime conjectures. We'll call $k$ a twinprime core if $6k \pm 1$ are both prime. Which numbers are the sum of twinprime cores? (This is equivalent to asking which multiples of 6 are the sum of two twin prime pairs $6k\pm 1$ and $6l \mp 1$).
Let's ask a question. As $n$ goes to infinity, how does the number of representations of $n$ as a sum of twin prime cores behave? Is it positive for sufficiently large $n$? Does it go to infinity? If so, how quickly?
Now let's discuss the remaining questions for the Goldbach worksheet that we haven't fully resolved yet.
1) What can we say about champions? How many are there up to N? Can we predict something using the random models of pseudoprimes?
2) What is wrong with the $Li(x)$ model for pseudoprimes? Why is the curve above the corrected data, not down the middle of it?
3) How big is the spread of the corrected data? How can we compute this? It's easy to get the upper envelope using the idea of champions. How can we get the lower envelope?
4) How many champions are there for corrected Goldbach?
5) (Naming credit: Sarah) Let's call $n$ a runt if corrected_gr($n$) is less than or equal to corrected_gr($m$) for all $m>n$ and corrected_gr($n$) is bigger than corrected_gr($k$) for all runts $k<n$. How can we compute runts?
6) Generalize the questions about corrected champions to questions about corrected runts.

What is the twin prime Goldbach conjecture?
Let $S$ be the set of twin prime cores, that is, those integers $k$ for which both $6k1$ and $6k+1$ are prime. Then for all integers $n>1$, with the exception of 16, 67, 86, 131, 151, 186, 191, 211, 226, 541, 701 we can find integers $s_1, s_2 \in S$ so that $s_1+s_2=n$.
Further, we can make more conjectures, about the fact that the number of representations tends to infinity, how fast it grows, and which numbers $n$ have more (respectively, fewer) representations than other numbers of about the same size.
Now create a list tpg_reps, in which the $n$th entry is the number of representations of $n$. Note that in the Goldbach investigation we were only interested in even numbers so we wanted pairs $[2n,r(2n)]$. Now, since we are interested in all integers, we can just use a list of the values.


[4, 1] [4, 1] 
[[2, 0], [4, 1], [6, 1/2], [8, 2], [10, 9/4], [12, 1], [14, 5/2], [16, 4], [18, 2], [20, 3]] [[2, 0], [4, 1], [6, 1/2], [8, 2], [10, 9/4], [12, 1], [14, 5/2], [16, 4], [18, 2], [20, 3]] 


