Vertex AI documentation is no longer being updated

Vertex AI's services are now part of Gemini Enterprise Agent Platform. See the most up-to-date information in the Agent Platform documentation.

Google models

Google models on Vertex AI offer fully managed and serverless models as APIs. To use a Google model on Vertex AI, send a request directly to the Vertex AI API endpoint. Because Google models use a managed API, there's no need to provision or manage infrastructure.

You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.

Gemma 4 26B A4B IT

Gemma 4 26B A4B IT is a multimodal model from Google handling text and image input and generating text output.

Go to the Gemma 4 26B A4B IT model card

Use Google models

For managed models, you can use curl commands to send requests to the Vertex AI endpoint using the following model names:

For Gemma 4 26B A4B IT, use gemma-4-26b-a4b-it-maas

To learn how to make streaming and non-streaming calls to Google models, see Call open model APIs.

To use a self-deployed Vertex AI model:

Navigate to the Model Garden console.
Find the relevant Vertex AI model.
Click Enable and complete the provided form to get the necessary commercial use licenses.

For more information about deploying and using partner models, see Deploy a partner model and make prediction requests .

Google model region availability

Google models are available in the following regions:

Model	Regions
Gemma 4 26B A4B IT	`global` Max output: 128,000 Context length: 262,144

What's next

Learn how to Call open model APIs.