Cortex Agent evaluation features in private preview¶
Important
The main functionality for Cortex Agent evaluation is now in GA. The following features are still in private preview:
Tool selection accuracy metric
Tool execution accuracy metric
For information on configuring, starting, and inspecting an agent evaluation run, see Cortex Agent evaluations.
Snowflake offers the following metrics in private preview to evaluate your agent against:
Tool selection accuracy – Whether or not the agent selects the correct tools at the correct stages in response to your prepared query.
Tool execution accuracy – How closely the agent matches expected invocations of tools, and if tool output matches an expected output.
Prepare an evaluation dataset for tooling metrics¶
Tooling evaluations use an additional key in the output column of your dataset, ground_truth_invocations. The value of this key is an array containing JSON objects describing a tool invocation. Use the empty array [] to verify that no tools are called by the agent. The following keys are available for ground_truth_invocations:
Parameter |
Description |
Used by |
|---|---|---|
|
The name of the agent tool expected to run as part of an evaluation. |
Tool selection Tool execution |
|
The numbered order in which this tool, with its associated arguments and outputs, is expected to be called. |
Tool selection (optional) Tool execution (optional) |
|
A VARIANT of key-value pairs that map to parameters and values for this tool invocation. For Cortex Analyst and Cortex Search tools, this VARIANT is instead the query string. |
Tool execution (optional) |
|
A VARIANT describing the expected output from the tool. For Cortex Analysts invoked by your agent, you can only inspect the generated SQL as part of the output. This value is contained in the For Cortex Search invoked by your agent, you can only inspect the sources searched. This value is an ARRAY containing the name of each source, contained in the |
Tool execution (optional) |
For example, if you expect the agent to call get_weather with the inputs of city = "San Francisco" and date == 08/02/2019 and return with a VARIANT of {"temp": "14", "units": "C"}, you would define an entry in the expected outputs array as:
The following example includes the above expected tool invocation as part of a full ground truth entry, where the expected answer from your agent is The temperature was 14 degrees Celsius in San Francisco on August 2nd, 2019.
To bring your JSON evaluation dataset into a Snowflake table, use the PARSE_JSON SQL function. The following example creates a new table agent_evaluation_data for the evaluation dataset, and inserts a row for the input query What was the temperature in San Francisco on August 2nd 2019? with the ground truth JSON from the previous example:
Note
Data you provide in the ground_truth column that isn’t used by a selected evaluation is ignored.