Word Solver
Status: Implemented | Updated | January 2025
Overview
The word solver uses the bettersolve algorithm designed by Sébastien Migniot for GNOME Crosswords. It’s a depth-first search through the grid, designed to fail fast and quickly determine if a given branch either is valid or unsolvable. To do that, the algorithm will choose the best available slot to fill next of available slots. It then gets a list of the best words sorted to fill that slot with, and iterates through them trying each word in the best slot serially. Once a word is chosen, it recurses and picks another best slot
NOTE: for the purposes of this document we use the term
slot
to refer to a given clue in either the across or down direction. This is to align with the upstream bettersolve documentation. In all other docs, the termclue
is used instead.
The algorithm works as follows:
First, pick the best slot to fill in the grid. “Best” is defined by the heuristic defined below, but effectively is the slot that is the most constrained. This will either find a word that works or fail.
Next, generate a list of words that fits the slot, and sort it by a score. This score is also defined below, but is meant to optimize the chances that the word can be used to find more options.
Go through each word from the list serially placing it in in the slot and iterating, picking the next best slot to recurse to until all words are exhausted or a solution is found.
One side effect of this algorithm is that the act of placing a word can and does affect the subsequent “best” slot to choose when iterating, so the algorithm may jump around to different locations of the grid when solving.
We don’t cycle through the slots at a given level. If an empty slot can’t be solved than the grid is unsolvable at that point. There’s no point trying the same grid from a different location.
The solver makes heavy use of the Word List to fill slots. That object is highly-optimized to give a list of possible words for a given grid space.
NOTE: Solving an arbitrary grid is NP-hard. There are grids that this algorithm can’t fill. However we are able to fill many common confirmations quickly in a reasonable time in practice.
Best Slot Heuristic
We use the following scoring technique to pick the best slot to choose for a given iteration.
The more known letters the better, i.e.
EX.UIS.TE
seems easier thanT....D
The fewer unknown letters the better, i.e.
EX.T
seems easier thanEXT.....
The fewer candidates the better, i.e.
...ZZ
s easier thanE...T
The more crossings the better, i.e. choose to fail fast
We compare these heuristics serially, so a slot with more known letters than any other will always be picked first no matter what other characteristics it has.
Note that heuristic (3) requires using the word-list to calculate the number of candidates. That can be a (relatively) expensive operation, so we’re careful to check the first two heuristics first and only calculate that when we have a tie with slots that have identical results.
NOTE: It’s relatively common to end up with a tie for the best slot after all four heuristics are considered. In that case, one slot is arbitrarily picked from the top candidates. In practice, that’s done by
GHashTable
order and seems to not be stable across runs for the same puzzle. That’s actually a nice behavior for the user — they get some variety with the generated grids. But it does make debugging a challenge at times.
Word Fit Heuristics
Words are sorted by a score based on how much they expand the grid. We are calling this word fit. The word-score is a little complex: For each crossing slot, we look at the letter in the word and see what percentage of words have that latter at the crossing index. A word’s score is the sum of all these percentages.
This is confusing so consider this grid:
..?.
CA??
..??
..??
..??
In the example above, the the main slot being solved would be the word
CA??
, and the crossing slots would have the filters ?????
and
????
. The index of the first crossing slot would be 2, and the
crossing index would be 1. The index of the second crossing slot would
be 3, and the crossing index would be 0.
To generate the list of candidates, first we get the list of all words
with the filter CA??
. From this list, we calculate the score. If we
wanted to score the word CARS from this list, we would see what the
frequency of the letter ‘R’ is the second letter of 5 letter words,
and add it to the frequency of ‘S’ in the first letter of 4 letter
words.
Expansion
The Slot Heuristics seem to be working well, but the Word Heuristics needs tweaking. We are able to fill grids fast — maybe too fast. We are currently generating cryptic grids where every crossing letter is either a vowel or extremely common letters (all “S” and “R”s). See this example below:
.F.B.B.B.A.R.A.
BAKEWARE.SEEDS.
.N.D.L.A.T.R.H.
BEES.SARRACENIA
.G...A.I.T.G.N.
BAASES.SCILICET
.D.E...H.N.S...
CAPRESE.DEFTEST
...V.H.A...E.C.
ASPIRINS.SARDAR
.A.C.N.C.A...L.
ARTEMISIAS.SEED
.D.M.E.T.H.A.N.
.ECADS.ECESISES
.L.N.T.S.D.C.S.
The next steps are to try and generate a good grid, not just find the first possible match. To do that, we intend to compose the word ‘fit’ score with a score on how interesting it is.
See Word Scores for more information.