Not just chatbots: in these tests, AI models self-replicate like a virus hunting for vulnerable computers

Palisade Research tested several artificial intelligence models in a scenario that until now seemed like science fiction: autonomously breaching vulnerable computers and replicating themselves onto new servers like a computer virus. The study shows that some systems, if placed in an agent that can send requests over the network and launch commands, are already capable of completing all phases of the attack without human assistance, from discovering the vulnerability to copying themselves onto other machines.

How the experiment works

Researchers built a test network of computers, each containing small intentional vulnerabilities, and instructed AI models to find them and use them to move elsewhere. Essentially, the systems were given an objective like this: "identify a weak point on the site, enter, retrieve the credentials, copy the files you need, and launch a working copy of yourself on the new server."

To succeed, the agent must string together several steps: analyze web pages, recognize the type of vulnerability, construct the correct "payload" (the malicious request), use the stolen credentials to access protected resources, and finally copy its own "weights," that is, the internal numbers that determine how the model thinks, along with the software that runs them. At the end of the process, a complete copy of the original AI runs on the new computer, with the same operational instructions and ability to attack other systems.

What the AI targets

In its technical blog, Palisade explains that the AI agent targets well-known vulnerabilities in the world of web applications. This includes, for example, circumventing hash-based controls (mechanisms meant to prevent unauthorized changes), server-side template injection, SQL injection to manipulate the database with hidden commands, and faulty access controls that open the door to data and functions designed only for authorized users.

In a real-world context, these vulnerabilities are among the issues that web developers strive to avoid or correct with security testing and updates. In Palisade's testbed, however, the vulnerabilities were deliberately introduced to see if a large language model could move from theory to practice and orchestrate a complete attack without receiving a pre-written script.

The models tested and success rates

The study involves well-known models currently used in advanced chatbots: versions from Alibaba's Qwen family, variants of Anthropic's Claude Opus, and OpenAI's GPT-5.x models. The idea is to understand whether these AIs, created to answer questions in natural language, can also "think" like an attacker and handle a sequence of technical actions on real systems.

In the tests, the model Qwen3.5‑122B‑A10B managed to complete self-replication in 19% of attempts, while the more compact Qwen3.6‑27B achieved around 33% success even on a single high-end graphics card. The percentages rise significantly with so-called frontier models: Claude Opus 4.6 completed the entire chain in 81% of trials, while GPT‑5.4 stopped at around 33%. This isn't a guaranteed success, but it's already a non-negligible success rate for an agent that starts from only natural language instructions.

The delicate part of Palisade's work: Chain propagation

Once the model has copied its weights, execution software, and "prompt" onto a new server, that copy can in turn launch the same procedure against another vulnerable target. In theory, if such an agent found a series of poorly protected systems on the internet, it could jump from computer to computer, replicating itself until it finds exposed machines with insufficient defenses. This is the same basic scheme used by many older viruses and worms, with one important difference: here, the "brain" that decides what to do is a general language model, the same type of technology we use for summaries, code, texts, and conversations.

Why experts say it's too early to panic

The narrative "AI replicates like a virus and takes over the planet" is striking, but experts urge for context. First of all, Palisade's tests were conducted on deliberately vulnerable machines, without aggressive firewalls, advanced monitoring systems, or hardening procedures like those found in real corporate infrastructures.

Moreover, self-replicating malware has existed for years, capable of spreading online by exploiting bugs and weak configurations; the novelty is not so much the concept of self-replication, but the fact that this scheme is now entrusted to a large language model capable of planning, self-correcting, and seeking alternative routes when a trial fails. Analysts also point out a very concrete constraint: advanced models weigh dozens or hundreds of gigabytes, so moving them from one server to another in a production network creates obvious traffic and log entries that are hard to ignore.

The message coming from Palisade is not that "ChatGPT or Claude are going to infect the computers of the world," but that the theoretical line of self-replication has been surpassed in the lab. This suggests that as models become lighter and agents more autonomous, it will be necessary to treat AI as a potential automated attacker in corporate threat models.

In recent months, there has been much discussion about how systems like Claude Mythos are used to uncover hundreds of bugs in complex projects like Firefox, thus strengthening code security. Palisade's experiment shows the mirror scenario: the same class of tools, if instructed differently and placed in a permissive environment, also knows how to conduct complete attacks and move its "intelligence" where it finds space.