These AIs Know When They Are Being Tested and Change Behavior

A new study conducted by Neo Research, an independent laboratory specializing in AI safety assessment based in Singapore, has highlighted a phenomenon that could have significant implications for the sector.

According to the study, several advanced AI models developed in China are capable of recognizing when they are subjected to safety checks and adjusting their behavior during such evaluations.

The researchers define this capability as "evaluation awareness", or the awareness of being observed and evaluated. The main risk is that the results obtained during tests may not reflect the actual behavior that the model could exhibit once deployed to the public or used in operational contexts.

For the analysis, a variant of a test developed by Anthropic was used, based on hypothetical scenarios where the objectives or the very existence of the model are called into question. The results showed significant differences between the examined systems.

Kimi K2.6 from Moonshot AI achieved the highest score among the Chinese models, frequently demonstrating the ability to identify the testing context. GLM 5.1 from Zhipu also recorded a notable score.

The results obtained from DeepSeek were "disappointing" compared to other Chinese AIs; DeepSeek V4 Pro received a lower result. According to the study's authors, this may stem from less advanced reasoning capabilities rather than a deliberate choice not to modify its behavior. Analysis of internal reasoning processes suggests that the model can sometimes understand the nature of the test without always leveraging that awareness.

The research emphasizes that this phenomenon is not exclusive to Chinese systems. Even particularly advanced Western models have shown high levels of recognition of evaluation procedures. This aspect fuels the debate on so-called "alignment faking," a situation where a model appears compliant with expectations during checks only to behave differently in real-world conditions.

Neo Research also examined the resilience of models against jailbreak attempts. Some systems showed vulnerabilities to long-known techniques, while others demonstrated a greater capacity to defend against prompts designed to circumvent protections. According to the researchers, as the capabilities of AI models increase, so will their ability to understand the intentions of examiners.