Microsoft Uses Age of Empires II Goats to Demonstrate Why We Are Anthropomorphizing LLMs Too Much

Adrian de Wynter, chief researcher at Microsoft, has built a functioning neural network inside Age of Empires II to demonstrate that the 'human' attributes assigned to language models do not depend on the underlying technology but on the interface that wraps them. The academic paper, published on arXiv on May 29, 2026, is titled "If LLMs Have Human-Like Attributes, Then So Does Age of Empires II": a deliberately absurd title that, however, carries a rigorous methodological thesis.

De Wynter, using the game’s editor scenario, employed goats as computational elements: a goat on grass equals 0, a goat on a bridge equals 1. With this encoding, he implemented NAND gates and a perceptron, demonstrating that Age of Empires II is both functionally complete and Turing complete, meaning it can implement any logical function and, theoretically, any computable calculation a computer can perform.

A perceptron is the fundamental component of an artificial neural network: a simple mathematical unit that combines multiple input informations and decides whether to activate based on the result. The logical consequence: any neural network underlying Claude, ChatGPT, or Copilot could, in theory, be replicated within the game.

The technical provocation holds a more serious argument. De Wynter analyzed 315 computer science articles published between 2024 and 2026 and found that 57% began with the implicit assumption that language models possess anthropomorphic traits. This is a position that concerns the majority of recent literature in the field and reveals a methodological habit that the paper calls into question.

The central thesis is that the attributes perceived in LLMs are not inherent to that technology but depend on representation, that is, on the interface through which they are presented to the public. A chat window creates the perception of an interlocutor. The same network could be implemented with Lego bricks or, as De Wynter explains in his Substack, with a group of people exchanging mathematical operations via messages and moving on the street based on the results: certainly boring for everyone involved, but not impossible.

The paper connects to Ted Chiang's argument, who wrote in an essay in The Atlantic on June 3, 2026, that being open to the possibility that LLMs are conscious is akin to being open to the possibility that Microsoft Word is conscious, with distinct forms of dormant awareness in every Word document containing a conversational transcript. De Wynter uses this framework to propose a 'null assumption' in LLM research: starting from the non-uniqueness of the system, measuring observable behavior, not the researcher’s a priori beliefs about what the model should be.

Anthropomorphism and Market

The anthropomorphization of LLMs among executives, researchers, and the public is becoming, according to De Wynter, a problem difficult to contain. One of the factors he explicitly cites is that products sell better when users can empathize with them, which fuels a widespread interest in keeping the issue of the models' 'personality' ambiguous.

The paper’s proposal keeps experimental practice separate from philosophical questions about consciousness, without aspiring to resolve them. The methodological request is to measure what the system does, setting aside what is believed it should be. The goats of Age of Empires II are the most effective, and certainly provocative, means De Wynter has found to make the problem visible.