AC-3-based word suggestion algorithm

Status: Approved | 2025

Author: Victor Ma

Reviewer: Jonathan Blandford

Introduction

The AC-3-based word suggestion algorithm generates a complete and perfect set of word suggestions for every clue in the grid, all at once. This algorithm is based on the AC-3 arc consistency algorithm from the field of constraint satisfaction problems.

CSP model for grid fillling

The problem of finding a valid grid fill is a constraint satisfaction problem (CSP). What follows is one possible way of modelling the CSP. However, other models also exist.

The CSP model

Let \(V\) be the set of variables for the CSP, and let

\[ V = \{a_1, a_2, \dots, a_m\} \cup \{d_1, d_2, \dots, d_n\}, \]

where \(a_i\) is the word in the \(i\)-Across clue and \(m\) is the number of across clues in the grid; and \(d_j\) is the word in the \(j\)-Down clue and \(n\) is the number of down clues in the grid. So, there is one variable for each clue, and each variable is set to the chosen word for its respective clue.

Let \(D\) be the set of domains for the CSP, and let

\[ D = \{ D_v \mid v \in V \} \]

where

\[ D_v = \text{the set of word-list words that match the filter for the clue that $v$ represents}. \]

Let \(I\) be the set of intersection constraints for the CSP, and let

\[\begin{split} I = \left\{ \begin{aligned} I_{v_1v_2} \mid v_1, v_2 \in V &\land \text{The clues for $v_1$ and $v_2$ intersect each other} \\ &\land \text{The clue for $v_1$ appears before the clue for $v_2$ in the clue list} \\ \end{aligned} \right\} \end{split}\]

where

\[\begin{split} \begin{aligned} I_{v_1v_2} = \ &\text{The values of $v_1$ and $v_2$ have the same letter} \\ &\text{at the offsets where their clues intersect}. \end{aligned} \end{split}\]

Let \(Q\) be the set of unique word constraints, and let

\[\begin{split} Q = \left\{ \begin{aligned} Q_{v_iv_j} \mid v_i, v_j \in V & \land i \neq j \\ & \land \text{The clues for $v_i$ and $v_j$ have the same length} \\ & \land \text{The clue for $v_i$ appears before the clue for $v_j$} \\ & \hspace{1.2em} \text{in the clue list} \\ \end{aligned} \right\}. \end{split}\]

where

\[ Q_{v_1v_2} = (\text{The value of } v_1 \neq \text{the value of } v_2) \]

Let \(C\) be the set of constraints for the CSP, and let

\[ C = I \cup Q. \]

Solution to the CSP

A solution to this CSP assigns a word to each clue. Suppose that the CSP has a generic Engish word list, and the following grid:

+---+---+---+---+
| ■ | ■ | ■ | N |
+---+---+---+---+
| T | I | M |   |
+---+---+---+---+
| ■ | ■ | ■ | X |
+---+---+---+---+
| W | E | S |   |
+---+---+---+---+

Then, the unique solution to the CSP is

\[\begin{align*} a_1 &= \text{TIME} \\ a_2 &= \text{WEST} \\ d_1 &= \text{NEXT}.\\ \end{align*}\]

The AC-3 algorithm

However, our goal is not to find a solution to the CSP; our goal is to find the set of possible words for a clue. To do this, we use the AC-3 algorithm. The AC-3 algorithm accepts a CSP model and returns arc-consistent domains for every variable in the CSP.

For our grid-filling CSP, this means that the AC-3 algorithm returns a set of possible words for each clue. Furthermore, each set contains every possible word that the clue can be and no word that the clue cannot be. In other words, there are no dead-end words in any set, and no words are “missing” from any set.

Performance

This algorithm must be implemented asynchronously. It is too slow to run synchronously.

Furthermore, if the grid is too sparse, this algorithm will run indefinitely. So, we should revert to a simpler word suggestion algorithm when the grid is sparse.

Caching

It should be possible to cache results, but the way to do so is not immediately obvious. The problem is that the AC-3 algorithm can only reduce the domains. It can never add values back into them.