# the De Bruijn Sequence Generator

*Brian Li*

*2020-06-22*

*MAT388 Card Game Set Mathematics Python*

Today we will be revisiting de Bruijn sequence and providing code for generating them. Furthermore, we will discuss the logic behind our generating method and provide examples. This post is a continuation of "Construction of De Bruijn Sequence and Graph Theory".

Recall that from "Construction of De Bruijn Sequence and Graph Theory", we have created code that can produce a de Bruijn sequence of **generate all de Bruijn sequences for any .** So far, most programs online only generate one de Bruijn sequence.

The following steps illustrates how our de Bruijn sequence generator works:

- Generate all
**vertices**of the de Bruijn grpah based on input values ofand . - Find all possible
**edges**between these vertices. - Find one random
**Eulerian circuit**. - Find the corresponding
**de Bruijn sequence**based on step 3.

Now we will break down the code based on the four steps above:

- The function
*generate_vertex*takes in valueand and outputs all vertices' value. This is done by outputting all possible ways to write a length (n-1) sequence with alphabet .

```
def generate_vertex(k, n):
length = n-1
lst = list(itertools.product(k, repeat=length))
return lst
```

2

3

4

- The function
*generate_edges*takes in valueand as well, but output not only the vertices but also the edges that connect them.

```
def generate_edges(k, n):
v_lst = generate_vertex(k,n)
edge_lst = []
for v1, v2 in itertools.combinations(v_lst, 2):
node1_copy = v1[1:]
node2_copy = v2[:n-2]
label1 = v2[n-2]
node1_copy2 = v1[:n-2]
node2_copy2 = v2[1:]
label2 = v1[n-2]
if node1_copy == node2_copy:
edge_lst.append(turn_tuple_to_str(v1)+ "----" + label1 + "--->" + turn_tuple_to_str(v2))
if node1_copy2 == node2_copy2:
edge_lst.append(turn_tuple_to_str(v2)+ "----" + label2 + "--->" + turn_tuple_to_str(v1))
for element in k:
edge_lst.append((element*(n-1)) + "----" + element + "--->"+ (element*(n-1)))
return edge_lst
```

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

- The function
*eulerian_path*takes in, and it outputs the Eulerian path using the networkX plug in. ['ba', 'aa', 'aa', 'ab', 'bb', 'bb', 'ba', 'ab'] represents the vertices based on the traversing order of the Eulerian path. One interesting thing is that, networkX generates different de Bruijn seqeunces based on different input order of vertices. This is an important property that we will utilize heavily later on.

```
def eulerian_path(k,n):
edge_lst = generate_edges(k,n)
new_lst = []
for item in edge_lst:
start_vertex = item[:n-1]
end_vertex = item[-n+1:]
new_lst.append((start_vertex, end_vertex))
random.shuffle(new_lst)
G = nx.DiGraph()
G.add_edges_from(new_lst)
return [u for u, v in nx.eulerian_circuit(G)]
```

2

3

4

5

6

7

8

9

10

11

- The function
*generate_debruijn*takes in, and output one de Bruijn seqence based on an Eulerian path generated from *eulerian_path*. What we are doing is that we are simply tracing the edges based on our networkX graph, and that gives us a de Bruijn sequence.

```
def generate_debruijn(k, n: int) -> str:
vertex_lst = eulerian_path(k,n)
str = ''
for vertex in vertex_lst:
str += vertex[-1]
return str
```

2

3

4

5

6

Now we can combine the four functions above and output a *summary* of

```
def summary(k, n):
v_lst = generate_vertex(k,n)
e_lst = generate_edges(k,n)
print ("You have entered an alphabet of " + str(k) + ", and you are computing for window length " + str(n))
print ("The number of vertices is " + str(len(v_lst)) + ", they are " + str(v_lst))
print ("The number of edges is " + str(len(e_lst)) + ", they are " + str(e_lst))
print ("Its eulerian circuit is " + str(eulerian_path(k,n)))
print ("And the de Bruijn sequence is " + str(generate_debruijn(k,n)))
```

2

3

4

5

6

7

8

We can test out the function:

```
summary(['a','b'],3)
>>> You have entered an alphabet of ['a', 'b'], and you are computing for window length 3
>>> The number of vertices is 4, they are [('a', 'a'), ('a', 'b'), ('b', 'a'), ('b', 'b')]
>>> The number of edges is 8, they are ['aa----b--->ab', 'ba----a--->aa', 'ab----a--->ba', 'ba----b--->ab', 'ab----b--->bb', 'bb----a--->ba', 'aa----a--->aa', 'bb----b--->bb']
>>> Its eulerian circuit is ['ab', 'bb', 'bb', 'ba', 'aa', 'aa', 'ab', 'ba']
And the de Bruijn sequence is aaabbbab
```

2

3

4

5

6

Now the question is, how do we generate all de Bruijn sequences? As you might realize by now, there should be a lot more than one de Bruijn sequence for large values of

number of | |
---|---|

Recall that networkX generates different de Bruijn seqences based on the order of each vertex input. We can utilize this properties to generate all possible de Bruijn sequences by inputting all possible order of vertices. This is what the function *get_as_many_bruijn* is doing. We randomly shuffled the order in the function *eulerian_path* so that eventually we can cover all de Bruijn sequence. However, this could be further optimized runtime wise by replacing the "random shuffle" with an organized list with all possible permutation. I am unsure how much significance that may have in terms of reducing the runtime, but it is worth trying out in the near future.

```
def get_as_many_bruijn(k,n):
total_num = (math.factorial(len(k))**(len(k)**(n-1))/(len(k)**n))
print ("in total " + str(total_num) + " de Bruijn sequences")
output = []
while len(output) < total_num:
sequence = generate_debruijn(k,n)
if sequence not in output:
output.append(sequence)
return output
```

2

3

4

5

6

7

8

9

Here are two outputs of the above function as examples:

```
get_as_many_bruijn(['a','b'],3)
>>> in total 2.0 de Bruijn sequences
>>> ['aaabbbab', 'baaababb']
```

2

3

```
get_as_many_bruijn(['a','b','c'],4)
>>> in total 1.2635683568857645e+19 de Bruijn sequences
>>> ['bbcbaccaabacaaaabcaaccccababaaacabcbcbbacbabccacaccbaacbbcccbbbbaabbcacbccbcabbab',
'aaabcacaccccacbabcccbacbbababbacabcbaabaccbbcbbbaaaccabaacaabbcabbbbccaacbccbcbca',
'bccbccccbbbbaabbbccaabaacaacccacaccbaaabcbcbbaccabacabcaaaacbcacbbcabbababbcbacba',
'acbabcbbcbcbacbbababbacaacacabbccccbbbcacbccbcabaccbaacccabcaabbbbaabccaccaaabaaa',
...]
```

2

3

4

5

6

7

One can create a SET deck such that the number attribute of the cards is a de Bruijn sequence. Simply pick a de Bruijn sequences from B(3,4), assign these as the numbers of the cards, and then complete the rest of the attributes arbitrarily. This leads to the question: Is it possible arrange the cards such that multiple attributes occur in every possible combination? The following theorem shows that one can arrange the cards such that every attribute is a de Bruijn sequence simultaneously.

**Definition:** For a de Bruijn sequence

**Example:** Let

**Theorem 1:** If

**Proof:** Let

When reading off the first column of

To generalize this, for each

Since

**Example:** Let us generate a de Bruijn sequence

```
generate_debruijn(['a','b','c'],4)
>>> aabccccacacbbabbacbababcbbcbacabbccabcacccbbbaacbcbccbaabbbbcabaaaacaabaccaaccbca
```

2

Now to shift this sequence by

Notice that by reading column by column (the blue letters), we can obtain all possible ways to construct a length-