Google models on Vertex AI offer fully managed and serverless models as APIs. To use a Google model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because Google models use a managed API, there's no need to provision or manage infrastructure.
You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.
Gemma 4 26B A4B IT
Gemma 4 26B A4B IT is a multimodal model from Google handling text and image input (with audio supported on small models) and generating text output.
Go to the Gemma 4 26B A4B IT model cardUse Google models
For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names:
- For Gemma 4 26B A4B IT, use
gemma-4-26b-a4b-it-maas
To learn how to make streaming and non-streaming calls to Google models, see Call open model APIs.
To use a self-deployed Vertex AI model:
- Navigate to the Model Garden console.
- Find the relevant Vertex AI model.
- Click Enable and complete the provided form to get the necessary commercial use licenses.
For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .
Google model region availability
Google models are available in the following regions:
| Model | Regions |
|---|---|
| Gemma 4 26B A4B IT |
|
What's next
Learn how to Call open model APIs.