Can you play chess without knowing the rules? LLM can !
Build a world model by observing moves only
Most people, upon their first interaction with ChatGPT or other large language models (LLMs), are astonished by the AI's ability to engage in conversation effortlessly. These models consider the context and user intentions, making it appear as if the AI truly understands human interaction and can respond accordingly.
I experienced a similar sense of awe when using the following prompt: "You are a Linux OS system terminal and are responding to user input as if they are entering Linux commands. Show the output without any explanation, just as it would appear in a terminal display." When I typed "mkdir helloworld" followed by "ls," ChatGPT was able to create a pseudo-Linux playground, demonstrating its capability to mimic real system behavior.
Now, researchers are delving deeper into the study of LLMs, focusing on their ability to create their own world models solely by observing actions. This fascinating aspect of AI continues to reveal intriguing potentials and opens up new avenues of exploration in the field.
extract from DeepLearning.AI
To me, the work on Othello-GPT is a compelling demonstration that LLMs build world models; that is, they figure out what the world really is like rather than blindly parrot words. Kenneth Li and colleagues trained a variant of the GPT language model on sequences of moves from Othello, a board game in which two players take turns placing game pieces on an 8x8 grid. For example, one sequence of moves might be d3 c5 f6 f5 e6 e3…, where each pair of characters (such as d3) corresponds to placing a game piece at a board location.
During training, the network saw only sequences of moves. It wasn’t explicitly told that these were moves on a square, 8x8 board or the rules of the game. After training on a large dataset of such moves, it did a decent job of predicting what the next move might be.
The key question is: Did the network make these predictions by building a world model? That is, did it discover that there was an 8x8 board and a specific set of rules for placing pieces on it, that underpinned these moves? The authors demonstrate convincingly that the answer is yes. Specifically, given a sequence of moves, the network’s hidden-unit activations appeared to capture a representation of the current board position as well as available legal moves. This shows that, rather than being a “stochastic parrot” that tried only to mimic the statistics of its training data, the network did indeed build a world model.
In short, image a deaf and illiterate person watch people play 10,000 times chess game and then able to win!