Boggle is a classical word searching game in a 4 x 4 grid of letter dice. The winner is determined based on how many words the player has found and the length of those words. We quite often play this game when visiting my parents. The inspiration for this project came from an interest in knowing all the possible words in a given board.

A finnish boggle board

So I decided to write a program that automates this process. To make the program convenient to use, the input is will be just an image of the board.

Subproblems to solve

There are two main tasks:

  • Extract the boggle board letters from an image
  • Find all the words

For the latter task, there exists pre-made solutions in GitHub (see e.g. here), that match sequences of letters from the game board to a dictionary file. This article considers the first task of extracting the letters and their locations from the image.

The biggest problem was to acquire a decent dataset. I had decided to use artificial neural networks (ANN), and those tend to require a decent amount of labeled training data. I first thought of rendering boggle board images with a computer to learn locating the board but came into a conclusion that making the renderer good enough to enable generalization to real data would take too much time. I decided to take a few photos of the board and use them to generate more training images.

Next time when I visited my parents, I took eight images of our game board with all the different die faces visible.

Locating the board

Given the shortage of the training data, I decided to use traditional image processing techniques to first locate the board in the image. Once located, it would be easy to extract the individual letter images from it due to the symmetry of the board. Then I would only need to train a rather simple classifier to extract the letters on the dice.

I had not done much image processing before but found a blog entry describing Sudoku grab quite useful. After some experimentation I arrived at the following approach for locating the Boggle board:

  1. Assume an image of a Boggle board taken from above (possibly with some other objects, e.g. a hand).
  2. Assume dice have a white background. Threshold the image to find all whitish blobs in it.
  3. Do outlier detection to determine what blobs are actually dice among the white blobs. Remove the others.
  4. Dilate the blobs until there is just one blob.
  5. This blob should now be basically the board. Find the minimal enclosing rectangle.
  6. Do a pespective transform to get the image of the board in upright position.
  7. Extract each die from the upright board image

Below is an example of the results after each of the above steps:

Image processing steps

The outlier detection in step three worked surprisingly well. I added it after I noticed that the thresholding alone was too unreliable. The image had to basically contain just the Boggle board and not much else. With the outlier detection, however, even having a hand, a sheet of paper, or even light reflection from the table didn’t confuse the program. I used three features for the outlier detection, the size of the blob, the blob location and the blob’s minimum bounding rectangle width in proportion to its height.

After extraction of the features the program locates 24 largest white blobs in the image, and assumes that 16 of them are dice (as there are 16 dice). The remaining eight blobs are “outliers” to be removed, e.g. a white sheet of paper next to the game board. Because the blobs corresponding to the 16 dice should have very similar blob feature values, the outlier detection algorithm should be able to label the other white blobs as outliers. The outlier detection algorithm I used was the EllipticEnvelope from Python’s scikit-learn.

Below is another illustration of the process for another image:

This process makes errors occasionally in bad lighting conditions because of the thresholding. However, most of the time it works well and the solution was sufficient for me for my purposes. The actual implementation used scipy.ndimage, OpenCV and NumPy Python libraries.

Generating a training dataset

As the location of the board is now known, the remaining task is to classify the letters in the dice. I decided to use artificial neural network (ANN) classifier for this task. I first tried to find ready made letter datasets, and found some, but they were for the english alphabet. That meant that I would anyway have to add the Ö, Ä letters to the dataset as well as the combined FG and BC letter dice from the finnish Boggle version.

Finnish boggle dice

Consequently, I decided to use six out of the eight of the previously taken images of the board to generate more training images by using random distortions. The six game board images each had 16 dice in them that gave me an initial set of 6 * 16 = 96 images of a die containing all the different sides of the dice in the game. First I labeled the die images by hand. To increase the size of my dataset, I took a blurred and sharpened version of the 6 original images to bring my dataset size to 96 * 3 = 288 labeled images of Boggle letters.

I decided to cast the dice images to black and white 64 x 64 pixel images for classification. After this I made a function that would add random noise (salt and pepper), shifts, and rotations to the images. Below you can see some of the original 288 black and white letter images on the left hand side and the result of the random noising on the right hand side.

Preprocessed letter images Preprocessed and noised letter images

I would use these randomly noised images to train my classifier. Because the distribution of the letters was not uniform (e.g. there were many more A:s compared to e.g. Y:s) I sampled them with weights to have an uniform probability for a letter appearing in a random sample.

I was now ready to train the ANN with a sort of an unlimited supply of training letter images and hoped that the above steps would be enough for the classifier to generalize to unseen images given the tiny initial set of training images.

Training the ANN

I used Keras with the TensorFlow backend to implement the ANN model. I decided to start with the basic one hidden layer setup with relu activation and a categorical loss. Because the training data are rather heavily preprocessed black and white images of letters, I anticipated that this ANN structure would already yield a decent performance.

Below is the Keras code of the ANN model with a hidden layer size of 192.


model = keras.models.Sequential([
    Dense(192, input_shape=(64*64,), activation='relu'),
    Dense(22),
    Activation('softmax')
])

opt = keras.optimizers.SGD(lr=0.01)

model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

I used random batches of 1000 noised letters in training the classifier. Using the original 288 letters training set (without random noising) as a validation set I reached over 80% accuracy after just some tens of thousands of iterations. After a few hundred thousands more I reached 100% accuracy, i.e. my model at least had nicely (over)fitted to the original training dataset.

28 original images

I then took a look at the ANN hidden layer weights which seemed to had formed rather reasonable looking shapes for detecting letters in different 90 degree rotations. The image on the right shows the first 20 out of the 192 hidden layer node weight maps arranged such that there are four maps on each row.

I then tried the classifier on a batch of 1000 randomly noised images. The accuracy was roughly 98%, i.e. around 20 misclassified letters per batch. For the boggle solver to be practical, the accuracy would have to be higher than that. That is because if even one letter is misclassified on the board, it will cause the program to find wrong words from the board. It would be frustrating to have to repeatedly retake images because of misclassifications.

I decided to let it train 32M letters overnight (using CPU only) and see if the accuracy would improve. On the next morning the performance was mostly 100% percent for batches of 1000 randomly noised letter images. Since the perfomance could not really be meaningfully improved, I decided not to develop the model further with e.g. convolutional layers.

Examples of missclassifications

On rare occasions a single letter would was missclassified in the batches of 1000 noised images. I wasn’t too worried of it because the random shifts and noising can sometimes generate extreme outcomes. To be sure I nevertheless checked what was going on. Below are a couple of examples of misclassified randomly noised images. The first one was classified as “S” while it should have been an “M” and the second one was classified as “A” but should have been an “Ä”. You can’t blame the ANN from the second one though: the random shift pushed the dots outside of the image. The first one is also rather unprobable in practice, since the letter is so close to the edge.

A missclassified letter A missclassifier letter

Testing the solver

I had left out two images from the training set to serve as a test set. So I had a total of 2 * 16 = 32 images of letters for the test set – not too many but that would have to do.

Evaluating the model with the 32 unseen letter images gave a 100% accuracy with a very small loss. This was encouraging, but due to the minuscule size of the test set, it would really need to be tested in practice.

When testing the first time in practice, it turned out that the lighting condition in the original 8 images was too alike (they were taken in sunny day) and the thresholding process failed quite often. This meant that the program was not able to locate the board nor do the black and white transform reliably with new images. I had to tune the thresholding process in the image processing part a bit to make it more robust. After a few iterations and using adaptive thresholds I got it working well with images in different lighting conditions. The ANN classifier did not undergo any changes and performed surprisingly well with the black and white images right from the beginning. The problem setup for the classifier was simple enough and the letter noising sufficient for allowing the classifier to generalize to unseen letter images.

The complete Boggle solver can be tried online here. The web interface would benefit from a bit more work as it is quickly put together for my family’s Boggle games. Notice that in order to work, the images must conform to the assumptions made above, such as that the image must be taken from above the board and the dice must have a white background. It has not been tested with other boggle boards besides the one that my parents own (see the image in the beginning of this post), and different type faces can potentially cause trouble. It won’t recognize letters that are not present in the dice set of the finnish Boggle version, such as B or C. Also for now, the words found will be in finnish ;) However, we have successfully used and enjoyed it in our Boggle games!