Intersection-based word suggestion algorithm
Status: Approved | Implemented | 2025
Author: Victor Ma
Reviewer: Jonathan Blandford
Introduction
The intersection-based word suggestion algorithm generates
word suggestions that are constrained by the current filter and the intersecting
filter at the current cell. The algorithm is implemented by the function
word_list_find_intersection().
The algorithm
The algorithm works like this, where \(n\) is the offset of the current cell in the current clue, and \(m\) is the offset of the current cell in the intersecting clue:
(Phase 1) Get the set of possible letters for the current cell, constrained by the current filter:
Get the set of words that match the current filter.
Return the set of letters that appear as the \(n^{\text{th}}\) letter, in the set of words.
(Phase 2) Get the set of possible words for the intersecting clue, and the set of possible letters for the current cell—both constrained by the intersecting filter and the current filter:
Get the set of words that satisfy both these constraints:
The word matches the intersecting filter.
The \(m^{\text{th}}\) letter of the word is in the set of letters from Phase 1.
Get the set of letters that appear as the \(m^{\text{th}}\) letter, in the set of words.
Return the set of words and the set of letters.
(Phase 3) (optional) Get the set of possible words for the current clue, constrained by the current filter and intersecting filter:
Get the set of words that satisfy both these constraints:
The word matches the current filter.
The \(n^{\text{th}}\) letter of the word is in the set of letters from Phase 2.
Return the set of words.
(Return) Return the set of words from Phase 2 and (optionally) Phase 3.
Limitations
This algorithm only considers two constraints:
The filter for the current clue.
The filter for the clue that intersects the current clue at the current cell.
This means that the algorithm is unaware of all the other intersecting clues—and the constraints that they impose on the current clue.
Consider the following grid:
+---+---+---+---+
| | | | Z |
+---+---+---+---+
| | | | E |
+---+---+---+---+
| | | | R |
+---+---+---+---+
| W | O | R | | < current clue
+---+---+---+---+
4-Down begins with ZER, so the only word it can be is ZERO. This constrains the bottom-right cell to the letter O.
4-Across starts with WOR. We know that the bottom-right cell must be O, so that means that 4-Across must be WORO. But WORO is not a word. So, 4-Down and 4-Across are both unfillable, because no letter fits in the bottom-right cell that works for both 4-Down and 4-Across.
Now, suppose that the current clue is 4-Across, and the current cell is the bottom-right cell. Then, the intersection algorithm checks the filters for 4-Across and 4-Down. It detects that the two clues are unfillable, and so it correctly returns the empty set.
But what if the current cell is one of the first three cells in 4-Across? Then, the algorithm never checks the filter for 4-Down, and so it doesn’t know about the constraint that it imposes on the current clue. And so, the algorithm returns all the words that match the filter WOR?, like WORD and WORM—even though they do not actually fit in the clue, because they turn 4-Down into a nonexistent word.
Consequences
There are two undesirable consequences of this:
The intersection algorithm can generate dead-end words. These are frustrating for the user. The user might enter a word from the word suggestion list, reach a dead end, and then have to undo all their progress since adding that suggested word.
The intersection algorithm’s output can vary based on the current cell. This is semantically incorrect, because the word suggestions for a clue should not change based on the cursor’s position.