MENACE tic-tac-toe

After some discussion with @Innerpanda and @Crosbyman64 I found myself determined to figure how to implement the “MENACE” (Machine Educable Noughts And Crosses Engine) algorithm for learning tic-tac-toe in HS. I learned about MENACE from this Matthew Parker video; I encourage you to watch it.

I’ve made good progress on a solution for the data representation challenge (how to represent the board, and the possible symmetries of the board, and the matchboxes to represent weights on possible moves). Here’s what I have so far:

Calling it “MENACE” at this point is a bit sus, since the learning stage of MENACE is not implemented, but it indicates my optimism that I’m on the right track.

This topic is for discussing this project, so that others can better learn from what I’ve figured out, and as many people as possible can learn from the feedback that others may have. Think of it like a totally open collab, with the primary goal of educational discussion, and a secondary goal of making a cool project. If your feedback is “this is all very obvious, and already done”, thank you, I would appreciate supporting links to projects and/or documentation.

One of the basic questions one has to answer for this is: how to represent the game board at a given point of play, and how to represent the possible moves with associated weights (adjusting those weights after each game is how MENACE learns)? How to account for all the possible symmetries of the board? Even with the symmetries, 304 boards have to be represented if MENACE goes first, and 470 more are needed if the human goes first.

I had originally thought that lots and lots of strings and string operations would be required to represent all this. With time, I hae better accepted the typeless wonder of HS variables into my life. Given that there are only 9 positions on the board, and nine possible moves, each of those can be a single digit. Using the trick of ensuring a leading non-zero digit, I decided on:

  • a ten-digit number starting with 8 (looks like B) representing the spatial state of the Board, with some numbering of all the squares, where each digit is 0 (no one moved there), 1 (player1 moved there), or 2 (player2 moved there). I’m using square numbering:
    7 8 9
    4 5 6
    1 2 3
  • ten-digit number starting with 6 (looks like G) representing the sequence of moves in a Game, where each digit (start with the first one after 6) records the location of the last move; the alternation between p1 and p2 is implicit (and whether p1 is X or 0 is actually irrelevant for the MENACE algorithm, it only matters for the UI).

These 10-digit numbers are strings too in HS, so it is easy to query a single digit via “character in”, while it is also easy to assemble and manipulate these numbers by adding or subtracting powers of ten. Double-precision floats (numbers in JavaScript) can store all of these numbers exactly.

The symmetries of the board can be represented by 9-digit numbers, recording the new location of each square, after some rotation or reflection. Here are all eight symmetries:
123456789
147258369
321654987
741852963
987654321
963852741
789456123
369258147
(notice that the center square 5 is always unchanged). Nerdy math: this is a Group called D4.

So from one board you can find 7 other boards that are effectively the same? By brute force I’m just applying all the symmetries to get all the reflected/rotated boards. How to choose one board that stands for the all the equivalent ones (the “canonical” one)? Here’s where I embrace that these things are numbers: the canonical board representation (10-digit number starting with 8) is the one with the largest numerical value.

From a given canonical board, how can you organize it into a canonical sequence of moves? This sequence of moves can be imagined as a large branching tree, where each node is a game up to some move, with one outgoing branch for each allowed move. Here too, you need to take into account the board symmetries (only considering moves that are unique even after applying all of D4). Also, the same board configuration could have been arrived at via different sequences of turns. Here too I say that the canonical game representation (10-digit number starting with 6) is the highest-valued one that arrives at a given board.

If you play with my proto-MENACE project, you’ll see the parts I’ve implemented so far. You’ll find the board in the lower right corner, p1=X, p2=O. With each move, you’ll see some Board numbers (BAvail for available squares, and BReal for the current game), as well as BCanon: the canonical board representation, found via which (SymmIdx) element of D4. You’ll also see the corresponding GCanon value (the canonical game for board BCanon).

Much remains to be done; the notes to myself in the second scene are reproduced here:

  • (DONE): transforming 8oard to other symmetries of 8oard, to find maximal (canonical) representation
  • (DONE): transforming canonical 8oard to canonical 6ame
  • discover unique (modulo symmetries) possible next moves from canonical 6ame
  • detect when a win is possible
  • creation of “matchbox” to store game and moves with weights, to be visually laid out on the left side of the screen
  • converting from 6ame to 8oard via inverse symmetry
  • some way of visualization each matchbox
  • some way of exporting/importing all boxes

Let me know if you want to take on some of these tasks, or better yet, just post here a remix that makes progress. The eventual final project should be a group effort.

One immediate question: Finding the canonical board is slow; can you make it faster?
You’ll find it as the “BIn → BOut, Y=SymmIdx” custom rule, currently used in the “TurnManager” text object (left of top-center edge), in the “when I get message [MoveMade]”. There’s a doubly-nested loop (for 8 symmetries, for 9 digits), and this will be a bottleneck. I am still ignorant about what happens in which frames, so how could this be restructured?

Hope to hear from you.

14 Likes

Nice topic

4 Likes

oo wow, thats cool!

3 Likes

Thanks for floating the idea of a tic-tac-toe bot in the forum. Let me know if I can help you help finish this.

2 Likes

Some progress has been made:

What is new:

  • finding the symmetries of the board is no longer a doubly nested loop, but via a small pool of permutor workers, and so is much faster.
  • now it figures out, at a given board configuration, which are the unique moves considering the possible symmetries. The unique moves are now highlighted by colored squares (bug: before the first move only three squares should be highlighted)

Knowing the unique moves is a prerequisite to setting up the “matchboxes”: each box should have beads only for the unique moves.

So my updated task list:

  • (DONE) transforming 8oard to other symmetries of 8oard, to find maximal (canonical) representation
  • (DONE) transforming canonical 8oard to canonical 6ame
  • (DONE) discover unique (modulo symmetries) possible next moves from canonical 6ame
  • detect when a win is possible
  • creation of “matchbox” to store game and moves with weights, to be visually laid out on the left side of the screen
  • (essentially done) converting from 6ame to 8oard via inverse symmetry
  • some way of visualization each matchbox
  • some way of exporting/importing all boxes

@Crosbyman64 I’m curious if this code is legible to you, since you already figured out how to make a tic-tac-toe game.

3 Likes

Some more progress: now the project can “remember” canonical boards it has seen before; each canonical board is the identifier for one “matchbox” in the MENACE method. Also the canonical board and the unique (up to symmetries) next moves are now visually shown on the left.

The “memory” method is that a query about a canonical board is broadcast, and if there is a matching board already seen it answers the query by pulling down a flag variable. Else (with no answer) a new clone takes ownership of that canonical board. I had thought I’d use some particular spatial organization to locate these pieces of memory, but I realized that it isn’t necessary and as well as needlessly complicated.

Next up is detecting when the next move can be a win, and then implementing (and displaying) the “beads” in the boxes, which is the core of MENACE.

5 Likes

A little more progress: now wins (and draws) are detected, and the possible moves from each board are reduced to the single winning move if it exists. In the displays of canonical boards on the left, possible moves are gray dots, and the winning move is a green dot.

Next up: figuring out how to represent the number of beads at each box’s possible moves, and ideally some way of displaying that number.
I’m thinking of hacking this so that the computer plays against itself and can do a breadth-first sweep through all possible games; would be a neat way of confirming the 304 and 470? box numbers for MENACE plays first and second, respectively.

4 Likes

So here is a fun update: my HS project is now able to discover (up to symmetries) all possible tic-tac-toe boards during all states of play:

Besides the tiny thumbnail, here is a screen snapshot showing things at the final stage:

This shows that over 159 minutes, the project found 338 board positions that player one (P1) faces when taking their turn (turn 1, 3, 5, …) and 289 board positions for player two (P2) moves. Counting the numbers for P1 for all but their last move:
1 + 12 + 108 + 183 = 304
Which is the “right” number of boxes for a MENACE implementation :tada:!:tada:! For example this web version shows the 304 boards on the right. In my version X goes first (instead of O), and you can see on the boards (which were arranged starting from the bottom left), a faint number indicating the turn number. As before, the gray circles show unique possible next moves, and the green circle a winning move (though there may be more than one). I’m happy that the ideas I outlined at the beginning of this topic have held up, and that this project has evolved to this point.

More things I’ve learned along the way

  • an earlier version of the project came close to getting the right number, but I was undercounting. I had incorrectly thought that the MENACE method knows the winning move on a board. So if there is a board with two possible moves, and one of them would immediately win and finish the game for MENACE, MENACE could take that win. But no: MENACE will initially take both moves with equal probability, and slowly learn that one of the two moves is the better move to take. Once I stopped pruning the exploration of possible boards at winning moves, I got to the right number.
  • MENACE does not include boxes for the 9th move, which makes sense: if the game has gotten to that point, there is only one move to take, so no need for a matchbox holding beads to make that choice. My project finds and counts these boards anyway, and I think it’s interesting to learn that there are 34 ways a tic-tac-toe game can get to the ninth turn, and to see exactly what those boards look like.
  • The Matthew Parker YT video I link to above includes the Matthew Scroggs who made a web version., and at 9 minutes in, they talk about having MENACE be able to be player 2. They say you’d need 470 additional matchboxes, but my project counted 289. What’s weird is that the way I did a breadth-first sweep through all possible games, the only way I discovered the 1, 12, 108, 183 boards for moves 1,3,5,7 was by knowing all the possible boards for the even moves 0,2,4,6: from an apparently too-small number of even-move boards, I found the correct number of odd-move boards. I wonder if the Matthews are actually wrong about this.

Anyway, it is fun to watch these board configurations being discovered (but I didn’t watch for 159 minutes; I made sure my iPad wouldn’t go to sleep and let the project play while I did other things). I implemented a kind of synthetic player that sweeps through the boards discovered so far, and plays all the available moves at each board, to expand the set of known boards. The move number over the board twirls when it is seen again during the course of play.

However, a Hopscotch MENACE implementation should not start with rediscovering all the boards. So I’m now very much faced with the problem that @CreationsOfaNoob asked about earlier: I need to get some text representation of the boards out of HS so that I can then (in a new project) save that huge string in a variable, so that upon startup the project can initialize all the matchboxes. I’ll figure something out, and record progress here.

5 Likes

Good work!

If you hadn’t found a way already, here’s what I did:

3 Likes

Thanks. Yup, I’ll whip up something to use that mechanism. I first have to design a string format for representing the number of beads in each of the 9 positions.

2 Likes