The Problem

I enjoy DnD. I own many dice. Sometimes, I want to show people all the cool dice I have. Sometimes people are “busy” or “at work” or “I’m asleep stop asking me to look at dice”.

To fix this I decided to make a machine that could look at them with me!

The Solution

This theoretical dice looking machine would need to be able to

  • Recognize a dice
  • Figure out where it is
  • do those last two things in realtime.

When it comes to looking at an image, figuring out whats in the image and then figuring out where those somethings are, no one beats YOLO (at time of writing). So we’ll be using YOLO for this project – specifically YOLOv8s.

The Data Problem

Functionality, to perform the transfer learning required of this problem, I needed a large amount of data to train on. To generate this data I wrote a custom shader to represent a theoretical dice and then used the Unity3D game engine to export and tag a slew of synthetic training data. As pictured below,

example training data example training data

There was a good deal of data wrangling to get the data out of Unity and into the YOLO model — but nothing a bit of Python can’t handle.

Results

Overall the detector worked decently enough. Some results on real imagery shown below,

dice detection generalizes dice detection generalizes dice detection generalizes

It generalized to non-D6 dice which was exciting to see, however some tunings of the model would detect false positives / negatives most commonly in regards to text or irregular numerals on the dice.

I also ended up piping in realtime camera footage from my webcam into the model and found it work well in realtime @30Hz!

Future Work

In the future I’d like to revisit synthetic data generation for machine learning. Its a deeply fascinating problem space.

For this project specifically I would be interested in,

  • more advanced dice shaders (support for range of fonts and number styles: dots, numerials: symbols, etc)
    • Could be interesting to spin off a secondary detector specifically for determining side number!
  • more advanced distractors (metablob-skin-blobs)
  • experiments with high fidelity simulations (ray tracing, etc)
    • Prior research suggests this may not be hugely impactful
  • More probable environments (simulated game tables, etc)
  • side prediction using video footage / physics sim