Goodbye Proprietary License: Gemma 4 Moves to Apache 2.0 and Google Challenges Qwen and Llama

Google DeepMind has released Gemma 4 on April 2, 2026, the new family of open models built on the same research foundations that power Gemini 3. Four models, two different licenses compared to the past, and a change in direction that could reshape the competition in the open-weight segment: for the first time in the history of Gemma, Google distributes everything under Apache 2.0, abandoning the custom license used in the past two years and dissipating the embarrassment for those intending to use Gemma models in production.

Architecture and Sizes

The new range of models consists of four variants designed for use on different hardware systems. Effective 2B (E2B) and Effective 4B (E4B) are intended for edge computing: smartphones, Raspberry Pi, NVIDIA Jetson Orin Nano modules. The prefix "E" indicates the operation of the models: E2B activates 2.3 billion effective parameters during inference but holds 5.1 billion total on disk, thanks to a technique called Per-Layer Embeddings, which incorporates a dedicated embedding table for each layer of the decoder. In other words, it is a model "large" in terms of storage but lightweight in execution.

On the workstation side, there are 26B Mixture of Experts (MoE) and 31B Dense. The MoE model has an architecture made up of 128 specialized experts, of which only 8 activate for each processed token, plus a shared expert that is always active. The result is that during inference, only 3.8 billion of the 25.2 billion total parameters are used, meaning intelligence from the 26B model with computational costs comparable to a 4B. The 31B Dense, on the other hand, follows the opposite philosophy: all parameters are always active, maximizing raw quality, making it an ideal candidate for fine-tuning. Both unquantized weights in bfloat16 fit into a single NVIDIA H100 GPU with 80GB, but quantized versions for consumer GPUs are also available.

Multimodality and Context Window

All four models handle mixed input of text and images with support for variable aspect ratio, a concrete improvement over the visual encoder of Gemma 3, which imposed a fixed resolution. Video and audio are natively supported: large models handle video, while E2B and E4B add native audio input for speech recognition and voice comprehension directly on edge hardware. The audio encoder has been redesigned, reducing its parameters from 681 million to 305 million, with the frame duration cut down from 160 to 40 milliseconds: just enough to make real-time transcription feasible on a phone without a connection.

Context windows scale with hardware power: 128K tokens for E2B and E4B, 256K tokens for the 26B and 31B models. Enough to pass entire code repositories or long documents in a single prompt. Pretraining covers over 140 languages, with operational support declared by Google for over 35 languages out-of-the-box: the coverage of pretraining and actual support in production are two distinct figures, and the one that matters for developers is the latter.

Benchmark: The Real Picture

Google places the 31B Dense model in third place globally among open models on Arena AI's leaderboard for text generation, with the 26B MoE in sixth place. The official statement is that Gemma 4 "surpasses models up to 20 times larger": a principle that Google has verified internally on public benchmarks and which identifies computational efficiency per parameter as the main competitive advantage of the family. On AIME 2026, the 31B scores 89.2%, a result that just a few months ago would have been considered cutting-edge even for proprietary models. To provide a reference that helps understand the scale of this generational leap, it suffices to note that Gemma 3 27B, without thinking mode, stopped at 20.8%.

The edge model E4B also holds its own: 42.5% on AIME 2026 and 52.0% on LiveCodeBench, numbers off the charts compared to expectations for a model intended to run on phones. The Codeforces ELO rating of the 31B is 2150, a testament to competitive-level coding capabilities. On GPQA Diamond, the college-level scientific reasoning benchmark, the 31B reaches 85.7% in reasoning mode.

Native Function Calling and Agency Workflows

One of the less obvious but more relevant aspects for developers is that function calling is not a workaround. Previous Gemma models, like most open models, handled calls to external tools through elaborate prompt engineering, with inconsistent results. In Gemma 4, the capability is trained from the ground up at the architectural level, based on the FunctionGemma research published by Google at the end of 2025. Native support for structured JSON output and native system instructions completes the picture for those building autonomous agents that interact with external APIs.

The Apache 2.0 License: End of Two Years of Enterprise Embarrassment

The old Gemma license was technically open-weight but not open-source in the sense that matters for those wishing to use it in production. It contained a prohibited-use policy that Google could unilaterally update, required developers to impose Google’s rules on all derivative projects, and could transfer license obligations even to models trained on synthetic data generated by Gemma. The legal teams of many companies chose Mistral or Qwen simply because those licenses were already known. Apache 2.0 eliminates any custom clauses: free commercial use, unrestricted redistribution, no cascading obligations.

Clement Delangue, CEO of Hugging Face, called the shift to Apache 2.0 "a huge milestone". Demis Hassabis, CEO of Google DeepMind, has positioned Gemma 4 as "the best open models in the world for their respective sizes". The license change is, strategically, the clearest signal that Google wants to regain the trust of the open-source enterprise ecosystem that it had partially lost with the restrictions of previous versions.

Gemini Nano 4 on Pixel: It’s Gemma 4

Google has confirmed that Gemini Nano 4, the on-device model intended for Pixel phones, will be directly based on E2B and E4B. This is the first official confirmation of the next generation of Nano. Android developers can already start prototyping agency workflows through the AICore Developer Preview, with compatibility guaranteed for the final launch of Nano 4. Integration is also available through the ML Kit GenAI Prompt API for those wanting to bring features into production on Android right away.

Ecosystem and Supported Hardware

NVIDIA has announced day-one optimization across the range, from Jetson Orin Nano to Blackwell GPUs. AMD is supported via the open-source ROCm stack. Google Cloud offers serverless deployment via Cloud Run with RTX Pro 6000 GPUs and scaling to zero when idle, as well as Vertex AI, GKE, and TPU Trillium and Ironwood for those scaling massively. Weights are downloadable from Hugging Face, Kaggle, and Ollama. Day-one supported frameworks: Transformers, TRL, vLLM, llama.cpp, MLX, LM Studio, Unsloth, SGLang, Keras, MaxText.

Since Google launched the first generation of Gemma in February 2024, total downloads have surpassed 400 million, with over 100,000 variants developed by the community. Some are taking surprising directions: MedGemma analyzes medical images, DolphinGemma interprets dolphin vocalizations, and SignGemma manages sign language. The breadth of the Gemmaverse is perhaps the strongest argument that Google brings to the table compared to competitors.