Matt Parker of Stand Up Maths recently uploaded a video entitled “Can you find: five five-letter words with twenty-five unique letters?,” which I highly recommend you watch to understand the point of this post, the problem we’re trying to solve, and the algorithm we’ll be optimizing.

In the video, he mentions that the Python code he wrote to solve the “five five-letter words with twenty-five unique letters” problem (or the “5-5-25” problem, for short) took 2,760,670.3 seconds (31.95 days) to run.

uses “recursion” and “graphs” (ooh, fancy words) and some other fun optimizations that will probably remind you of your CS classes.

Anagram de-duplicating

Parker mentioned this in his video. What we actually care about in this problem is finding the largest set of mutually-disjoint sets of letters that can be arranged into an English word. (The algorithm for finding such sets is similar to the maximum disjoint set problem, and thus, so is the algorithm for solving the 5-5-25 problem.)

So, for the purposes of this problem, “beard” = “bread” = “bared”.

For the remainder of this post, “word” shall refer to one of these unordered sets of letters.

Bit packing

We could store a word as a hash set, array, or vector of characters, or a string construct, but there is an even more efficient medium: unsigned, 32-bit integers. Or rather, a single, unsigned, 32-bit integer.

Let’s think of our integer as a list of 32 on-off switches. We can assign each letter of the alphabet to a switch (a = 0, b = 1, …), and encode a word by switching on the bits corresponding to the letters in the word. We’ll have 6 bits left over, since the alphabet only contains 26 letters.

If we want to figure out if two words have any letters in common, we simply bitwise-AND them together and see if any bits are set in the output. We can create a bitmask of multiple words by bitwise-ORing them together. This functionality is implemented for the Word struct in the solution.

Not only are bitwise operations useful, they’re also really freaking fast. They’re what CPUs were built for.

To calculate a solution, we’ll progressively create a bitmask of the words in the solution so far.

How do we decide which words to try to add to our solution? If we try every single word, we end up with a brute-force solution that takes a long time (Θ(n⁵)) to complete. Let’s try to be a little more clever about which words we try to add to our solution.

To do this, we’ll create a graph. Specifically, we’ll be creating a directed acyclic graph. Each node in the graph will be a word. Edges will point from a word (A) to a later word (B) when the words (A) and (B) are disjoint. Our words have a well-defined ordering (since they are just numbers), so a “later word” is a word whose 32-bit integer encoding represents a higher number.

Node

Encoding

Edges

"bread"

131099

["chunk", "imply"]

"anger"

139345

["witch", "imply"]

"chunk"

1057924

["imply"]

"witch"

4718980

[]

"imply"

16816384

[]

We only need to have edges pointing to later disjoint words (and not to all disjoint words) because the order of the words in the solution doesn’t matter. If a solution contains word (A) and word (B), our algorithm will discover the solution regardless of which word it encounters first. Having only the single edge from (A) to (B) and not also vice-versa means that the algorithm can still discover the solution containing both (A) and (B) when it encounters (A), but when it traverses (B), it excludes (A) from future (duplicate) traversals.

Our algorithm will traverse the graph and build a solution by recursively visiting the edges of a node in the current solution candidate and returning the longest disjoint sub-solution.

In a vacuum, this algorithm would still be O(n^{5}), but due to the nature of the data and the way we’re constructing the graph (it’s pretty sparse—using the test data, there is no path in the graph that contains more than 15 nodes), traversals should be O(n^{2}) in practice.

Short circuits

The algorithm is complete and correct at this point, but we can still speed it up with some shortcuts.

Let’s add a field to each node in our graph that tells us the length of the longest possible path after this node. If our search needs 3 more words and it encounters a node that says its longest path only contains 1 more node, our search can skip that node early.

We can also maintain a set of word combinations that we have already tried and failed to use in a solution.

Compiler Options

Since we only care about the runtime of this program, we can have the compiler optimize the heck out of it:

$ time ./target/release/jotto-problem
[solutions output]
Found 537 solutions that are 5 words long
________________________________________________________
Executed in 1.23 secs fish external
usr time 1.28 secs 0.14 millis 1.28 secs
sys time 0.04 secs 1.05 millis 0.04 secs