Google

Gemma 4 12B

Open Weightsapache-2.0Released Jun 2026

Google's medium-size open-weight model with 12 billion parameters from the Gemma 4 family. Encoder-free unified multimodal architecture that natively processes text, image, audio, and video inputs without dedicated encoders. Features a 256K context window and supports 140+ languages. First medium-sized model capable of natively ingesting audio. Suitable for local deployment on GPUs with 16GB VRAM.

Capabilities

chatcompletionvisionaudiocode-generationreasoningfunction-calling

Modalities

textimageaudiovideocode

Context Window

262K tokens

Providers

available

Available from 3 providers

Cheapest

Google AI Studio

$0.00/1M tokens

Google AI Studio, Hugging Face Inference, NVIDIA NIM

Providers (3)

Sorted by total cost (input + output per 1M tokens). Click a row to view provider details.

Provider	Pricing (per 1M)	Rate Limits	Regions	Health	Latency
Google AI Studio	In: FreeOut: Free	15 RPM / 500K TPM	us-east-1eu-west-1global	Healthy	0ms
NVIDIA NIM	In: FreeOut: Free	200 RPM / 500K TPM	us-east-1us-west-2	Healthy	0ms
Hugging Face Inference	In: $0.10Out: $0.10	300 RPM / 500K TPM	us-east-1eu-west-1	Healthy	0ms

Quick Start

Use this model via Google AI Studio with an OpenAI-compatible SDK.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.google-ai-studio.com/v1",
  apiKey: process.env.GOOGLE_AI_STUDIO_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gemma-4-12b-it",
  messages: [
    { role: "user", content: "Hello!" }
  ],
});

console.log(response.choices[0].message.content);

Using Google AI Studio API • OpenAI-compatible SDK