This document describes how to view content security insights from Model Armor for supported AI agents.
Model Armor screens the requests and responses for security risks, such as indirect prompt injection attacks, sensitive data leakage, and the generation or serving of harmful content. For more information, see Model Armor.
You can view the results of Model Armor operations at the following levels:
- Top-level view: insights for all supported AI agents in the project
- Agent-level view: insights for a single AI agent
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Enable the Model Armor API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.Enable the Model Armor API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.- Configure Model Armor on one or more gateways in your project.
- To monitor agents that communicate with a Google Cloud MCP server, configure Model Armor with MCP servers.
- Set up tracing for your agent.
Required role
To get the permissions that you need to monitor content security violations, ask your administrator to grant you the following IAM roles on the project:
- Observability View Accessor (
roles/observability.viewAccessor) - Observability Analytics User (
roles/observability.analyticsUser) - Logs Viewer (
roles/logging.viewer) - Logs View Accessor (
roles/logging.viewAccessor)
For more information about granting roles, see Manage access to projects, folders, and organizations.
These predefined roles contain the permissions required to monitor content security violations. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to monitor content security violations:
-
monitoring.monitoredResourceDescriptors.list -
monitoring.metricDescriptors.list
You might also be able to get these permissions with custom roles or other predefined roles.
Supported agents
The Security tab is populated with Model Armor insights for the following agents only:
- Agents deployed in Agent Runtime and governed by a gateway where Model Armor is configured.
- Agents deployed in Agent Runtime and communicating with a Google Cloud MCP server.
- Agents deployed in Agent Runtime in a project where Model Armor floor settings are configured.
View content insights for supported AI agents in a project (top-level view)
To view the content security insights for all supported AI agents in a project, follow these steps:
- In the Google Cloud console, go to the Gemini Enterprise Agent Platform Security tab.
- Select your project.
If you don't see content security insights on the Security tab and you have supported AI agents in your project, make sure you have set up tracing for your agents.
View content insights for an AI agent (agent-level view)
To view the content security insights for supported agents, follow these steps:
- In the Google Cloud console, go to Agent Registry.
- Select your project.
- Click the name of the agent.
- Click the Security tab.
View the number of flagged or blocked interactions
Go to the top-level or agent-level Security tab.
On the Security tab, view the number of interactions, including flagged and blocked interactions. The Security tab displays the following metrics:
- Total interactions: The total number of prompts and responses that are analyzed by Model Armor.
- Interactions flagged: The number of interactions that violated a configured policy in your Model Armor template or floor settings.
- Interactions blocked: The number of interactions blocked if you
configured Model Armor in the
INSPECT_AND_BLOCKmode. These blocked interactions violated floor settings or templates.
Monitor content security violations
Go to the top-level or agent-level Security tab.
In the Violations over time chart, monitor the number of detected violations over time.
The violations detected are categorized into the following areas:
- Prompt injections and jailbreaks: Content violations indicating the presence of prompts that contain malicious commands or jailbreak attempts. For more information, see Prompt injection and jailbreak detection.
- Malicious URL: Content violations indicating the presence of malicious URLs. For more information, see Malicious URL detection.
- Responsible AI: Content violations that are detected by safety filters, such as harassment and hate speech. For a complete list of responsible AI categories, see Responsible AI safety filter.
- Sensitive data: Content violations involving the presence of sensitive information types or custom information types that you define. For more information, see Sensitive Data Protection.
For more information about these detectors, see Model Armor filters.
Identify the agents with the most violations
Go to the top-level Security tab.
The Security tab displays the top 10 agents with the most violations. The list shows the agent ID of each agent and the number of violations detected for that agent.
To view the Model Armor insights for a specific agent in the list, go to Agent Registry to search for the agent by its agent ID. Then, go to the agent-level Security tab for that agent.
Query and analyze telemetry data using SQL
To query and analyze telemetry data from Model Armor, use Observability Analytics, which provides a SQL-based query interface.
- Go to the top-level Security tab.
- For the view that you want to query, click More chart options > Explore in Observability Analytics.
For general instructions on how to use Observability Analytics, see Query and analyze telemetry with Observability Analytics.
Download violations data to a PNG or CSV file
To download violations data to a PNG or CSV file, follow these steps:
- In the Violations over time view on the Security tab, select the period for which you want to download data.
- Click More chart options > Download.
- Click Download PNG or Download CSV to download the data in your preferred format.