Product Thumbnail

Gemma 3

Build with multimodal AI from Google

Open Source
Artificial Intelligence
Development

Gemma 3 is Google's new models for multimodal AI (text, images, video). 1B-27B sizes, 128K context, 140+ languages. Includes ShieldGemma 2 for safety.

Top comment

Hi everyone!

Check out Gemma 3, Google's latest family of models for building multimodal AI applications! This is a big step up from the previous Gemma versions, adding video understanding and a much larger context window.

Key features:

🖼️ Multimodal: Handles text, images, and short videos.
🧠 Multiple Sizes: Available in 1B, 4B, 12B, and 27B parameter versions.
↔️ 128K Context Window: A major increase, allowing for processing much more information.
🌍 Multilingual: Supports over 35 languages out-of-the-box, pretrained on over 140.
🛠️ Integrates with Hugging Face Transformers, Ollama, JAX, Keras, PyTorch, Unsloth, vLLM, and Gemma.cpp.
🛡️ It Includes a separate 4B model, ShieldGemma 2, for image safety classification.
⚡ Optimized for NVIDIA GPUs, Google Cloud TPUs, and AMD GPUs.

Gamma 3 is a clear sign of how quickly the multimodal AI space is advancing.

Let's start exploring its capabilities in Google AI Studio!

Comment highlights

I have to say Gemini 2 powered by Gemma3 is just incredible in its multimodality capability! The image generating and editing in the chat interface blew me away - I was able to create and modify images right in the conversation flow without switching between tools and uploading and downloading repeatingly,, This kind of seamless integration between text and visual creation is exactly what I've been waiting for in AI assistants. The quality and speed of the image generation is impressive too, much more responsive than other multimodal models I've tried. grats on the launch!

Google is not stopping. This is a solid addition to the multimodal space and makes me wonder what cool stuff could be a good starting point to build with it.