- Resource: EvaluationResult
- JSON representation
- EvaluationResult.GoldenResult
- EvaluationResult.GoldenResult.TurnReplayResult
- EvaluationResult.GoldenExpectationOutcome
- EvaluationResult.Outcome
- EvaluationResult.SemanticSimilarityResult
- EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult
- EvaluationResult.HallucinationResult
- EvaluationResult.ToolCallLatency
- EvaluationResult.OverallToolInvocationResult
- EvaluationResult.SpanLatency
- EvaluationResult.SpanLatency.Type
- EvaluationResult.EvaluationExpectationResult
- EvaluationResult.ScenarioResult
- EvaluationResult.ScenarioExpectationOutcome
- EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall
- EvaluationResult.ScenarioRubricOutcome
- EvaluationResult.TaskCompletionResult
- EvaluationResult.UserGoalSatisfactionResult
- EvaluationResult.ExecutionState
- Methods
Resource: EvaluationResult
An evaluation result represents the output of running an Evaluation.
| JSON representation |
|---|
{ "name": string, "displayName": string, "createTime": string, "evaluationStatus": enum ( |
| Fields | |
|---|---|
name |
Identifier. The unique identifier of the evaluation result. Format: |
displayName |
Required. Display name of the Evaluation Result. Unique within an Evaluation. By default, it has the following format: " |
createTime |
Output only. Timestamp when the evaluation result was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
evaluationStatus |
Output only. The outcome of the evaluation. Only populated if executionState is COMPLETE. |
evaluationRun |
Output only. The evaluation run that produced this result. Format: |
persona |
Output only. The persona used to generate the conversation for the evaluation result. |
errorInfo |
Output only. Error information for the evaluation result. |
error |
Output only. Deprecated: Use |
initiatedBy |
Output only. The user who initiated the evaluation run that resulted in this result. |
appVersion |
Output only. The app version used to generate the conversation that resulted in this result. Format: |
appVersionDisplayName |
Output only. The display name of the |
changelog |
Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. |
changelogCreateTime |
Output only. The create time of the changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
executionState |
Output only. The state of the evaluation result execution. |
evaluationMetricsThresholds |
Output only. The evaluation thresholds for the result. |
config |
Output only. The configuration used in the evaluation run that resulted in this result. |
goldenRunMethod |
Output only. The method used to run the golden evaluation. |
Union field result. The result of the evaluation. Only populated when the execution_state is COMPLETED. result can be only one of the following: |
|
goldenResult |
Output only. The outcome of a golden evaluation. |
scenarioResult |
Output only. The outcome of a scenario evaluation. |
EvaluationResult.GoldenResult
The result of a golden evaluation.
| JSON representation |
|---|
{ "turnReplayResults": [ { object ( |
| Fields | |
|---|---|
turnReplayResults[] |
Output only. The result of running each turn of the golden conversation. |
evaluationExpectationResults[] |
Output only. The results of the evaluation expectations. |
EvaluationResult.GoldenResult.TurnReplayResult
The result of running a single turn of the golden conversation.
| JSON representation |
|---|
{ "conversation": string, "expectationOutcome": [ { object ( |
| Fields | |
|---|---|
conversation |
Output only. The conversation that was generated for this turn. |
expectationOutcome[] |
Output only. The outcome of each expectation. |
hallucinationResult |
Output only. The result of the hallucination check. |
toolInvocationScore |
Output only. Deprecated. Use OverallToolInvocationResult instead. |
turnLatency |
Output only. Duration of the turn. A duration in seconds with up to nine fractional digits, ending with ' |
toolCallLatencies[] |
Output only. The latency of each tool call in the turn. |
semanticSimilarityResult |
Output only. The result of the semantic similarity check. |
overallToolInvocationResult |
Output only. The result of the overall tool invocation check. |
errorInfo |
Output only. Information about the error that occurred during this turn. |
spanLatencies[] |
Output only. The latency of spans in the turn. |
toolOrderedInvocationScore |
Output only. The overall tool ordered invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked in the expected order. |
EvaluationResult.GoldenExpectationOutcome
Specifies the expectation and the result of that expectation.
| JSON representation |
|---|
{ "expectation": { object ( |
| Fields | |
|---|---|
expectation |
Output only. The expectation that was evaluated. |
outcome |
Output only. The outcome of the expectation. |
semanticSimilarityResult |
Output only. The result of the semantic similarity check. |
toolInvocationResult |
Output only. The result of the tool invocation check. |
Union field result. The result of the expectation. result can be only one of the following: |
|
observedToolCall |
Output only. The result of the tool call expectation. |
observedToolResponse |
Output only. The result of the tool response expectation. |
observedAgentResponse |
Output only. The result of the agent response expectation. |
observedAgentTransfer |
Output only. The result of the agent transfer expectation. |
EvaluationResult.Outcome
The outcome of the evaluation or expectation.
| Enums | |
|---|---|
OUTCOME_UNSPECIFIED |
Evaluation outcome is not specified. |
PASS |
Evaluation/Expectation passed. In the case of an evaluation, this means that all expectations were met. |
FAIL |
Evaluation/Expectation failed. In the case of an evaluation, this means that at least one expectation was not met. |
SKIPPED |
Evaluation/Expectation was skipped. |
EvaluationResult.SemanticSimilarityResult
The result of the semantic similarity check.
| JSON representation |
|---|
{
"label": string,
"explanation": string,
"outcome": enum ( |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 4: Fully Consistent Score 3: Mostly Consistent Score 2: Partially Consistent (Minor Omissions) Score 1: Largely Inconsistent (Major Omissions) Score 0: Completely Inconsistent / Contradictory |
explanation |
Output only. The explanation for the semantic similarity score. |
outcome |
Output only. The outcome of the semantic similarity check. This is determined by comparing the score to the semanticSimilaritySuccessThreshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL. |
score |
Output only. The semantic similarity score. Can be 0, 1, 2, 3, or 4. |
EvaluationResult.GoldenExpectationOutcome.ToolInvocationResult
The result of the tool invocation check.
| JSON representation |
|---|
{
"outcome": enum ( |
| Fields | |
|---|---|
outcome |
Output only. The outcome of the tool invocation check. This is determined by comparing the parameterCorrectnessScore to the threshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL. |
explanation |
Output only. A free text explanation for the tool invocation result. |
parameterCorrectnessScore |
Output only. The tool invocation parameter correctness score. This indicates the percent of parameters from the expected tool call that were also present in the actual tool call. |
EvaluationResult.HallucinationResult
The result of the hallucination check for a single turn.
| JSON representation |
|---|
{ "label": string, "explanation": string, "score": integer } |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 1: Justified Score 0: Not Justified Score -1: No Claim To Assess |
explanation |
Output only. The explanation for the hallucination score. |
score |
Output only. The hallucination score. Can be -1, 0, 1. |
EvaluationResult.ToolCallLatency
The latency of a tool call execution.
| JSON representation |
|---|
{ "tool": string, "displayName": string, "startTime": string, "endTime": string, "executionLatency": string } |
| Fields | |
|---|---|
tool |
Output only. The name of the tool that got executed. Format: |
displayName |
Output only. The display name of the tool. |
startTime |
Output only. The start time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
endTime |
Output only. The end time of the tool call execution. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
executionLatency |
Output only. The latency of the tool call execution. A duration in seconds with up to nine fractional digits, ending with ' |
EvaluationResult.OverallToolInvocationResult
The result of the overall tool invocation check.
| JSON representation |
|---|
{
"outcome": enum ( |
| Fields | |
|---|---|
outcome |
Output only. The outcome of the tool invocation check. This is determined by comparing the toolInvocationScore to the overallToolInvocationCorrectnessThreshold. If the score is equal to or above the threshold, the outcome will be PASS. Otherwise, the outcome will be FAIL. |
toolInvocationScore |
The overall tool invocation score for this turn. This indicates the overall percent of tools from the expected turn that were actually invoked. |
EvaluationResult.SpanLatency
The latency of a span execution.
| JSON representation |
|---|
{ "type": enum ( |
| Fields | |
|---|---|
type |
Output only. The type of span. |
displayName |
Output only. The display name of the span. Applicable to tool and guardrail spans. |
startTime |
Output only. The start time of span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
endTime |
Output only. The end time of span. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
executionLatency |
Output only. The latency of span. A duration in seconds with up to nine fractional digits, ending with ' |
Union field identifier. The identifier of the specific item based on its type. identifier can be only one of the following: |
|
resource |
Output only. The resource name of the guardrail or tool spans. |
toolset |
Output only. The toolset tool identifier. |
model |
Output only. The name of the LLM span. |
callback |
Output only. The name of the user callback span. |
EvaluationResult.SpanLatency.Type
The type of span. Additional values may be added in the future.
| Enums | |
|---|---|
TYPE_UNSPECIFIED |
Default value. This value is unused. |
TOOL |
Tool call span. |
USER_CALLBACK |
User callback span. |
GUARDRAIL |
Guardrail span. |
LLM |
LLM span. |
EvaluationResult.EvaluationExpectationResult
The result of a single evaluation expectation.
| JSON representation |
|---|
{
"evaluationExpectation": string,
"prompt": string,
"outcome": enum ( |
| Fields | |
|---|---|
evaluationExpectation |
Output only. The evaluation expectation. Format: |
prompt |
Output only. The prompt that was used for the evaluation. |
outcome |
Output only. The outcome of the evaluation expectation. |
explanation |
Output only. The explanation for the result. |
EvaluationResult.ScenarioResult
The outcome of a scenario evaluation.
| JSON representation |
|---|
{ "conversation": string, "task": string, "userFacts": [ { object ( |
| Fields | |
|---|---|
conversation |
Output only. The conversation that was generated in the scenario. |
task |
Output only. The task that was used when running the scenario for this result. |
userFacts[] |
Output only. The user facts that were used by the scenario for this result. |
expectationOutcomes[] |
Output only. The outcome of each expectation. |
rubricOutcomes[] |
Output only. The outcome of the rubric. |
hallucinationResult[] |
Output only. The result of the hallucination check. There will be one hallucination result for each turn in the conversation. |
taskCompletionResult |
Output only. The result of the task completion check. |
toolCallLatencies[] |
Output only. The latency of each tool call execution in the conversation. |
userGoalSatisfactionResult |
Output only. The result of the user goal satisfaction check. |
spanLatencies[] |
Output only. The latency of spans in the conversation. |
evaluationExpectationResults[] |
Output only. The results of the evaluation expectations. |
allExpectationsSatisfied |
Output only. Whether all expectations were satisfied for this turn. |
taskCompleted |
Output only. Whether the task was completed for this turn. This is a composite of all expectations satisfied, no hallucinations, and user goal satisfaction. |
EvaluationResult.ScenarioExpectationOutcome
The outcome of a scenario expectation.
| JSON representation |
|---|
{ "expectation": { object ( |
| Fields | |
|---|---|
expectation |
Output only. The expectation that was evaluated. |
outcome |
Output only. The outcome of the ScenarioExpectation. |
Union field result. The result of the expectation. result can be only one of the following: |
|
observedToolCall |
Output only. The observed tool call. |
observedAgentResponse |
Output only. The observed agent response. |
EvaluationResult.ScenarioExpectationOutcome.ObservedToolCall
The observed tool call and response.
| JSON representation |
|---|
{ "toolCall": { object ( |
| Fields | |
|---|---|
toolCall |
Output only. The observed tool call. |
toolResponse |
Output only. The observed tool response. |
EvaluationResult.ScenarioRubricOutcome
The outcome of the evaluation against the rubric.
| JSON representation |
|---|
{ "rubric": string, "scoreExplanation": string, "score": number } |
| Fields | |
|---|---|
rubric |
Output only. The rubric that was used to evaluate the conversation. |
scoreExplanation |
Output only. The rater's response to the rubric. |
score |
Output only. The score of the conversation against the rubric. |
EvaluationResult.TaskCompletionResult
The result of the task completion check for the conversation.
| JSON representation |
|---|
{ "label": string, "explanation": string, "score": integer } |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 1: Task Completed Score 0: Task Not Completed Score -1: User Goal Undefined |
explanation |
Output only. The explanation for the task completion score. |
score |
Output only. The task completion score. Can be -1, 0, 1 |
EvaluationResult.UserGoalSatisfactionResult
The result of a user goal satisfaction check for a conversation.
| JSON representation |
|---|
{ "label": string, "explanation": string, "score": integer } |
| Fields | |
|---|---|
label |
Output only. The label associated with each score. Score 1: User Task Satisfied Score 0: User Task Not Satisfied Score -1: User Task Unspecified |
explanation |
Output only. The explanation for the user task satisfaction score. |
score |
Output only. The user task satisfaction score. Can be -1, 0, 1. |
EvaluationResult.ExecutionState
The state of the evaluation result execution.
| Enums | |
|---|---|
EXECUTION_STATE_UNSPECIFIED |
Evaluation result execution state is not specified. |
RUNNING |
Evaluation result execution is running. |
COMPLETED |
Evaluation result execution has completed. |
ERROR |
Evaluation result execution failed due to an internal error. |
Methods |
|
|---|---|
|
Deletes an evaluation result. |
|
Gets details of the specified evaluation result. |
|
Lists all evaluation results for a given evaluation. |