TechnologyMar 31, 2026· 2 min read

Microsoft Unites GPT and Claude in Copilot Researcher: A Hybrid System That Surpasses All AI Research Tools

Microsoft has introduced a new approach to AI-based research by combining OpenAI and Anthropic models within its Copilot Researcher tool. The new features, called Critique and Council, represent a significant change from traditional systems based on a single model.

In the current landscape, characterized by fierce competition among companies like Google, OpenAI, and other emerging players, each platform has sought to propose its own model as the definitive solution for advanced research. Microsoft, however, has chosen a different path: rather than focusing on a single system, it orchestrates multiple models to achieve better results.

The Critique mode is designed to divide the research process into two distinct phases. In this flow, GPT models handle the initial generation: planning the research, gathering sources, and producing a first draft of the report. Subsequently, Claude comes into play, acting as an expert reviewer, analyzing the content to verify its accuracy, quality of citations, and consistency with the original request.

Introducing Critique, a new multi-model deep research system in M365 Copilot. You can use multiple models together to generate optimal responses and reports. pic.twitter.com/m4RlQmCKzs
-- Satya Nadella (@satyanadella)
March 30, 2026

With this choice, does Microsoft solve the problem of internal control in AI? This approach addresses one of the main limitations of traditional AI systems: the lack of internal control. In single-agent models, the same system handles all phases, increasing the risk of errors, hallucinations, or inaccurate sources. By separating generation and revision, Critique introduces a level of quality control that significantly improves the final output.

Data confirms the effectiveness of this method. In the DRACO benchmark, which evaluates 100 complex tasks in fields such as medicine, law, and technology, the combined system achieved a score of 57.4, clearly surpassing single models like Claude Opus at 42.7. The improvement particularly concerns the breadth of analysis, presentation quality, and factual accuracy.

The second mode, Council, adopts a competitive approach. GPT and Claude work in parallel, producing two separate reports. A third model acts as a "judge," comparing the results and synthesizing common points, differences, and unique contributions from each system. This allows users to gain a more comprehensive view without having to manually compare the responses. Currently, these features are available for users of Microsoft's Frontier program and require a Microsoft 365 Copilot license.