# 2021.09.13 MATH 3600 Thue Morse continued

## 249 days ago by calkin

def TM(n): # Return the first 2^n entries of the Thue-Morse sequence if n.is_integer() and n >=0: S=[0] Sbar=[1] for i in srange(n): T=S+Sbar Tbar=Sbar+S S=T Sbar=Tbar return(S) else: return('Please make sure input is a non-negative integer')
print(TM(6))
 [0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0] [0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0]

We now have a function TM() which takes an input $n$ and outputs the first $2^n$ entries of the Thue-Morse sequence.

1.  Is it binary?  (Duh!)

2.  If $n\geq 1$, does the number of zeroes equal the number of ones in the first $2^n$ entries? Yes!  Because the sequence $S_{n-1}$ has equal numbers of zeroes and ones, so does $\bar{S}_{n-1}$.  Hence $S_{n} = S_{n-1} \bar_{S}_{n-1}$ does too!   So, since $S_{1}$ has an equal number of zeroes and ones, we can prove by mathematical induction that for all $n\geq 1$, $S_n$ has equal numbers of zeroes and ones.

3.  The length doubles when $n$ increases by 1.

4.  We never see three of the same value consecutively.  (This is an example of a more general phenomenon: there is no string $a$ of zeroes and ones so that we see three copies of $a$ consecutively, i.e. without other entries between them).  We call such sequences with this property "cube-free".  Can you prove that we never see three zeroes or three ones in a row?  Can you prove that the sequence is cube-free?

5. The entries come in pairs, either 0,1 or 1,0  (Note that this, if we prove it, shows that the number of zeroes equals the number of ones so long as we take the first $2k$ values of the sequence.  We don't need to take a power of two entries.

6.  If we assume that we know that the sequence has no 000 and no 111, we could ask how often each of the strings 00, 01, 10, 11, 001, 010, 011, 100, 101, 110 appear as consecutive substrings in the sequence.  For example, in the first 32 entries, we see 01 eleven times and 00 five times.   We just did that by eyeballing the sequence: can we write a program to take a string and count how often 00 occurs?

7.  More generally, which substrings of length $k$ occur in the TM sequence?  How many substrings are there of length $k$ which occur?  What is their relative long-term frequency?  Does such a limiting frequency exist?

How many possible cube-free binary strings of length $k$ are there?  Can we write a program to generate all of them?

Can we write a program which checks if a given binary sequence is cube-free?

In an upcoming assignment, write a lab-report style project exploring these questions.

You should have code which, given an input $k$, generates all cube-free strings of length at most $k$.

You should have a function which takes as input, a list, and tests whether it is cube-free.

You should generate data, including a plot of how many cube-free strings there are of length $k$, for, say, $k$ less than 20.

There is another way of creating the Thue-Morse sequence.

At the $n^{th}$ stage, read along the current sequence, replacing each 0 with 01, and each 1 with 10.

So our sequences become

0

01

0110

01101001

Which substrings of length $k$ occur?  What is their relative frequency?

$k=1$: in the first $N$ terms (for $N$ even) 0, 1 each occur $N/2$ times.

$k=2$: 00, 01, 10, 11 all appear.  In the first 64 terms, 00 appears 10 times.  11 appears 11 times. 01 appears 21 times, and 10 appears 21 times.  It appears that 00 and 11 appear about half as often as 01 and 10.

print(TM(6))
 [0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0] [0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0]

Task for homework tonight: write a function

number_of_copies(T,L) which counts the number of copies of the string L in the list T.  Here T will be typically the first $2^n$ terms of the Thue=Morse sequence, and L will be a word of length $k$.  It should return the number of copies of L in T.

def number_of_copies(T,L): l=len(L) t=len(T) counter=0 # find the length l of L using len(L) # determine the possible starting positions in T using len(T) # walk along T at each possible starting position, checking if the # word of length l starting there is equal to L # Keep a counter running # return the value of the counter. return(counter)

Next task: create a dictionary of which strings occur in $T$ and how often.

A dictionary is a data structure which is like a list, but is indexed, rather than from 0 to len(L)-1, by "keys".  Example:  Suppose I want to keep track of my family's SSNs.  I could create a dictionary with keys "Neil", "Sue", "Chloe" and "James", and associated to each key I would have a value equal to the individual's SSN.

SSN_dict={'Neil':'0000000000','Sue':'0000000001'}
SSN_dict['Neil']
 '0000000000' '0000000000'
SSN_dict['Sue']
 '0000000001' '0000000001'
type(SSN_dict)
  
SSN_dict.has_key('Neil')
 True True
SSN_dict.has_key('Chloe')
 False False
SSN_dict.update(
 Traceback (click to the left of this block for traceback) ... SyntaxError: invalid syntax Traceback (most recent call last): File "", line 1, in File "_sage_input_31.py", line 10, in exec compile(u'open("___code___.py","w").write("# -*- coding: utf-8 -*-\\n" + _support_.preparse_worksheet_cell(base64.b64decode("U1NOX2RpY3QudXBkYXRlKCdDaGxvZSc6JzAwMDAwMDAwMDInKQ=="),globals())+"\\n"); execfile(os.path.abspath("___code___.py")) File "", line 1, in File "/tmp/tmp3BQNZi/___code___.py", line 2 SSN_dict.update('Chloe':'0000000002') ^ SyntaxError: invalid syntax
T=TM(4)
print(T)
 [0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0] [0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0]
L=[0,0,1] i=6 S=T[i:i+3] print(S==L)
 False False
L==S
 True True
counter=0 for i in srange(14): if L==T[i:i+3]: counter += 1 print(counter)
 2 2
def number_of_copies(T,L): l=len(L) t=len(T) counter=0 # find the length l of L using len(L) # determine the possible starting positions in T using len(T) # walk along T at each possible starting position, checking if the # word of length l starting there is equal to L # Keep a counter running # return the value of the counter. for i in srange(t-l+1): # i is position of first element we're comparing if L==T[i:i+l]: # compare L to a sublist of T counter+=1 # if the are equal, increment the counter return(counter)
number_of_copies(T,L)
 2 2
number_of_copies(TM(10),[0,0,0])
 0 0
number_of_copies(TM(10),[0,0,1])
 170 170
number_of_copies(TM(10),[0,1,0])
 170 170
number_of_copies(TM(10),[0,1,1])
 171 171
number_of_copies(TM(10),[1,0,0])
 170 170
number_of_copies(TM(10),[1,0,1])
 170 170
number_of_copies(TM(10),[1,1,0])
 171 171
number_of_copies(TM(10),[1,1,1])
 0 0
number_of_copies(TM(10),[0,0])
 170 170
number_of_copies(TM(10),[0,1])
 341 341
number_of_copies(TM(10),[1,0])
 341 341
number_of_copies(TM(10),[1,1])
 171 171

How can we use dictionaries to keep track of all substrings of length $k$?

TM_substrings_dict= {}
T=TM(10) k=3
 1024 1024
T[0:3]
 [0, 1, 1] [0, 1, 1]
TM_substrings_dict['[0,1,1]']=1
T[1:1+3]
 [1, 1, 0] [1, 1, 0]
TM_substrings_dict['[1,1,0]']=1
T[2:2+3]
 [1, 0, 1] [1, 0, 1]
TM_substrings_dict['[1,0,1]']=1
T[3:3+3]
 [0, 1, 0] [0, 1, 0]
TM_substrings_dict['[0,1,0]']=1
print(TM_substrings_dict)
 {'[0,1,1]': 1, '[0,1,0]': 1, '[1,0,1]': 1, '[1,1,0]': 1} {'[0,1,1]': 1, '[0,1,0]': 1, '[1,0,1]': 1, '[1,1,0]': 1}
T[4:4+3]
 [1, 0, 0] [1, 0, 0]
TM_substrings_dict['[1,0,0]']=1
T[5:5+3]
 [0, 0, 1] [0, 0, 1]
TM_substrings_dict['[0,0,1]']=1
T[6:6+3]
 [0, 1, 1] [0, 1, 1]
TM_substrings_dict['[0,1,1]']
 1 1
TM_substrings_dict['[0,1,1]']+=1
TM_substrings_dict.has_key('[1,1,1]')
 False False
L=T[7:7+3] print(L)
 [1, 1, 0] [1, 1, 0]
TM_substrings_dict.has_key(str(L))
 False False
type(L)
  
str(L)
 '[1, 1, 0]' '[1, 1, 0]'

Doing it by hand, I omitted the spaces in the lists!!!

This is incompatible with doing it automatically!!!

What we want to do is to do things automatically from the start.

T=TM(10) k=7 TM_substrings_dict={} for i in srange(len(T)-(k-1)): L=T[i:i+k] if TM_substrings_dict.has_key(str(L)): TM_substrings_dict[str(L)]+=1 else: TM_substrings_dict[str(L)]=1
TM_substrings_dict
 {'[0, 0, 1, 0, 1, 1, 0]': 85, '[0, 0, 1, 1, 0, 0, 1]': 43, '[0, 0, 1, 1, 0, 1, 0]': 42, '[0, 1, 0, 0, 1, 0, 1]': 42, '[0, 1, 0, 0, 1, 1, 0]': 43, '[0, 1, 0, 1, 1, 0, 0]': 42, '[0, 1, 0, 1, 1, 0, 1]': 42, '[0, 1, 1, 0, 0, 1, 0]': 43, '[0, 1, 1, 0, 0, 1, 1]': 42, '[0, 1, 1, 0, 1, 0, 0]': 85, '[1, 0, 0, 1, 0, 1, 1]': 85, '[1, 0, 0, 1, 1, 0, 0]': 43, '[1, 0, 0, 1, 1, 0, 1]': 42, '[1, 0, 1, 0, 0, 1, 0]': 42, '[1, 0, 1, 0, 0, 1, 1]': 43, '[1, 0, 1, 1, 0, 0, 1]': 42, '[1, 0, 1, 1, 0, 1, 0]': 42, '[1, 1, 0, 0, 1, 0, 1]': 43, '[1, 1, 0, 0, 1, 1, 0]': 42, '[1, 1, 0, 1, 0, 0, 1]': 85} {'[0, 0, 1, 0, 1, 1, 0]': 85, '[0, 0, 1, 1, 0, 0, 1]': 43, '[0, 0, 1, 1, 0, 1, 0]': 42, '[0, 1, 0, 0, 1, 0, 1]': 42, '[0, 1, 0, 0, 1, 1, 0]': 43, '[0, 1, 0, 1, 1, 0, 0]': 42, '[0, 1, 0, 1, 1, 0, 1]': 42, '[0, 1, 1, 0, 0, 1, 0]': 43, '[0, 1, 1, 0, 0, 1, 1]': 42, '[0, 1, 1, 0, 1, 0, 0]': 85, '[1, 0, 0, 1, 0, 1, 1]': 85, '[1, 0, 0, 1, 1, 0, 0]': 43, '[1, 0, 0, 1, 1, 0, 1]': 42, '[1, 0, 1, 0, 0, 1, 0]': 42, '[1, 0, 1, 0, 0, 1, 1]': 43, '[1, 0, 1, 1, 0, 0, 1]': 42, '[1, 0, 1, 1, 0, 1, 0]': 42, '[1, 1, 0, 0, 1, 0, 1]': 43, '[1, 1, 0, 0, 1, 1, 0]': 42, '[1, 1, 0, 1, 0, 0, 1]': 85}
T=TM(20) number_of_k_sublists=[] for k in srange(1,10): TM_substrings_dict={} for i in srange(len(T)-(k-1)): L=T[i:i+k] if TM_substrings_dict.has_key(str(L)): TM_substrings_dict[str(L)]+=1 else: TM_substrings_dict[str(L)]=1 number_of_k_sublists.append(k,len(TM_substrings_dict))
print(number_of_k_sublists)
 [] []
T=TM(15) number_of_k_sublists=[] for k in srange(1,100): TM_substrings_dict={} for i in srange(len(T)-(k-1)): L=T[i:i+k] if TM_substrings_dict.has_key(str(L)): TM_substrings_dict[str(L)]+=1 else: TM_substrings_dict[str(L)]=1 number_of_k_sublists.append([k,len(TM_substrings_dict)])
print(number_of_k_sublists)
 [[1, 2], [2, 4], [3, 6], [4, 10], [5, 12], [6, 16], [7, 20], [8, 22], [9, 24], [10, 28], [11, 32], [12, 36], [13, 40], [14, 42], [15, 44], [16, 46], [17, 48], [18, 52], [19, 56], [20, 60], [21, 64], [22, 68], [23, 72], [24, 76], [25, 80], [26, 82], [27, 84], [28, 86], [29, 88], [30, 90], [31, 92], [32, 94], [33, 96], [34, 100], [35, 104], [36, 108], [37, 112], [38, 116], [39, 120], [40, 124], [41, 128], [42, 132], [43, 136], [44, 140], [45, 144], [46, 148], [47, 152], [48, 156], [49, 160]] [[1, 2], [2, 4], [3, 6], [4, 10], [5, 12], [6, 16], [7, 20], [8, 22], [9, 24], [10, 28], [11, 32], [12, 36], [13, 40], [14, 42], [15, 44], [16, 46], [17, 48], [18, 52], [19, 56], [20, 60], [21, 64], [22, 68], [23, 72], [24, 76], [25, 80], [26, 82], [27, 84], [28, 86], [29, 88], [30, 90], [31, 92], [32, 94], [33, 96], [34, 100], [35, 104], [36, 108], [37, 112], [38, 116], [39, 120], [40, 124], [41, 128], [42, 132], [43, 136], [44, 140], [45, 144], [46, 148], [47, 152], [48, 156], [49, 160]]
list_plot(number_of_k_sublists)

Project: write a sage lab report on investigations into the number of substrings of the Thue-Morse sequence.  You should include

working code --- well commented so it's clear what it does

connective text, including mathematics formated with dollar signs or display math symbols.

The report should have an introduction that describes the problem (including definitions),

a description of what has been done, and a conclusion section that describes your findings.

Export your worksheet to pdf, and upload it to canvas.

Due Monday at midday.