Gemma 4: Google's Open-Weight AI Models Under Apache 2.0
Discover Google's Gemma 4, open-weight AI models under the Apache 2.0 license. Explore native multimodality, token efficiency, and unrestricted commercial use.
TL;DR: Google DeepMind released Gemma 4 on April 2, 2026, under the permissive Apache 2.0 license, removing all commercial usage restrictions. The family includes four models ranging from 2B to 31B parameters, featuring native multimodality, 256K context windows, and 2.5x better token efficiency than Qwen3.5 27B.
Key facts
- Gemma 4 was officially released by Google DeepMind on April 2, 2026.
- The entire Gemma 4 family is licensed under Apache 2.0, eliminating previous MAU caps and acceptable use policies.
- The model lineup includes a 31B Dense variant, a 26B MoE variant activating only 3.8B parameters, and E2B/E4B edge models.
- Gemma 4 supports a context window of up to 256K tokens, doubling the limit of previous generations.
- The 31B Reasoning variant scored 39 on the Intelligence Index, trailing Qwen3.5 27B by three points.
- Gemma 4 31B uses approximately 2.5 times fewer output tokens than Qwen3.5 27B to achieve comparable results.
- Smaller E2B and E4B variants are optimized for offline deployment on Raspberry Pi and Google Pixel devices.
Google DeepMind Unveils Gemma 4 with Unprecedented Licensing Freedom
Google DeepMind has officially released Gemma 4, a comprehensive family of open-weight artificial intelligence models that marks a definitive shift in the landscape of open-source machine learning. Announced on April 2, 2026, the release introduces four distinct model sizes, all built upon the same advanced research foundations as the proprietary Gemini 3 system. The most significant development, however, is not merely architectural but legal: Google has released the entire Gemma 4 family under the Apache 2.0 license.
This licensing decision removes the restrictive conditions that have historically accompanied many high-performance open-weight models. Previous iterations of the Gemma series, as well as competitors in the field, often included acceptable use policies (AUP) or monthly active user (MAU) caps that limited commercial exploitation. By adopting the Apache 2.0 license, Google has eliminated these barriers, enabling unrestricted commercial deployment for enterprises, startups, and individual developers alike. This move positions Gemma 4 as a highly attractive option for organizations seeking to deploy AI locally while maintaining full data sovereignty and avoiding vendor lock-in.
A Diverse Family: From Server-Grade to Edge Devices
The Gemma 4 lineup is engineered to cover the entire spectrum of inference needs, from high-performance cloud servers to resource-constrained edge devices. The family consists of four specific variants, each tailored to different computational environments.
At the top end of the performance spectrum is the 31B Dense model. Designed for server-grade performance, this model is intended for demanding enterprise workloads and advanced reasoning tasks where maximum accuracy is prioritized over inference speed. Complementing this is the 26B Mixture-of-Experts (MoE) model, which represents a significant optimization for efficiency. While the 26B MoE model has a total parameter count of 26 billion, it activates only 3.8 billion parameters during inference. This architectural choice allows the model to deliver performance comparable to much larger dense models while significantly reducing computational overhead and latency.
For mobile and edge applications, Google has introduced two smaller variants: the E2B and E4B models. These models are specifically optimized to run on consumer hardware, including Raspberry Pi devices and Google Pixel smartphones. The inclusion of these edge-optimized models underscores Google’s commitment to making advanced AI accessible outside of data centers, enabling offline functionality and reduced latency for end-users.
Native Multimodality and Extended Context
Gemma 4 introduces native multimodality across its entire lineup, a feature that was previously fragmented or limited to specific model sizes in earlier generations. The models natively support text, images, and video inputs. Additionally, the E2B and E4B variants extend this capability to include audio processing, making them suitable for voice-interactive applications and multimodal edge devices.
Another critical upgrade is the context window capacity. The larger models in the Gemma 4 family support a context window of up to 256K tokens, a doubling of previous limits. This extended context allows the models to process and reason over significantly larger documents, codebases, and conversation histories without losing track of earlier information. For developers building agentic workflows or complex analytical tools, this expansion is crucial for maintaining coherence over long-horizon tasks.
Benchmarking Performance and Token Efficiency
Google positions Gemma 4 as “byte for byte, the most capable open models”. This claim is supported by rigorous benchmarking, particularly in the realm of reasoning and token efficiency. In independent evaluations, the 31B Reasoning variant of Gemma 4 scored 39 on the Intelligence Index. While this score trails the Qwen3.5 27B model by three points, the performance metric tells a more nuanced story.
The key differentiator for Gemma 4 is its token efficiency. Despite the slight gap in raw Intelligence Index scores, Gemma 4 31B demonstrates superior efficiency by using approximately 2.5 times fewer output tokens to achieve comparable results. This efficiency has profound implications for deployment costs and latency. Fewer output tokens mean reduced compute requirements, lower cloud inference costs, and faster response times for users. For organizations running models locally, this efficiency translates to less strain on hardware and the ability to handle more complex queries within the same computational budget.
Strategic Implications for the Open AI Ecosystem
The release of Gemma 4 under the Apache 2.0 license represents a strategic pivot for Google in the open-source AI arena. By removing the MAU caps and acceptable use policies that previously restricted commercial use, Google is lowering the barrier to entry for businesses that want to integrate AI into their products without legal ambiguity. This approach contrasts with the restrictive licensing models of some competitors and aligns with the growing demand for truly open, commercially viable AI models.
Furthermore, the emphasis on local deployment capabilities reinforces the importance of data privacy and sovereignty. Enterprises can now run Gemma 4 on their own infrastructure, ensuring that sensitive data never leaves their premises. This is particularly relevant for industries with strict regulatory requirements, such as healthcare, finance, and government, where data leakage is a critical risk. The ability to run advanced models like the 31B and 26B variants locally, thanks to their optimized architectures, makes this a viable option for many organizations.
Availability and Deployment
Gemma 4 is available for immediate use through multiple channels. The models can be accessed via Google Cloud, providing seamless integration for enterprises already using Google’s infrastructure. Additionally, the open-weight nature of the models allows developers to download and run them locally on their own hardware. This flexibility ensures that users can choose the deployment method that best fits their needs, whether it be cloud-based scalability or local privacy and control.
For developers interested in experimenting with Gemma 4, the models are available through standard AI model repositories and platforms. The documentation and support materials provided by Google aim to facilitate easy integration and deployment, regardless of the user’s technical expertise. The release of Gemma 4 is not just a technical milestone but a statement of intent from Google DeepMind to lead the open-source AI movement with models that are both powerful and accessible.
Conclusion
Gemma 4 represents a significant leap forward in the capabilities and accessibility of open-weight AI models. With its diverse range of sizes, native multimodality, extended context windows, and superior token efficiency, it offers a compelling alternative to proprietary models. The shift to the Apache 2.0 license removes previous restrictions, enabling unrestricted commercial use and fostering broader adoption. As the AI landscape continues to evolve, Gemma 4 stands out as a model that balances performance, efficiency, and openness, making it a valuable tool for developers and enterprises worldwide.