Categories:: System functions (Control)

EXECUTE_AI_EVALUATION¶

Start, retrieve the status of, or delete a Cortex Agent evaluation run.

For more information on Cortex Agent evaluations, see Cortex Agent evaluations.

See also:: SYSTEM$CREATE_EVALUATION_DATASET , GET_AI_RECORD_TRACE (SNOWFLAKE.LOCAL) , GET_AI_EVALUATION_DATA (SNOWFLAKE.LOCAL) , GET_AI_OBSERVABILITY_LOGS (SNOWFLAKE.LOCAL)

Syntax¶

EXECUTE_AI_EVALUATION( <evaluation_job> , <run_parameters> , <config_file_path> )

Arguments¶

evaluation_job

One of the following values:

‘START’: Starts an evaluation

‘STATUS’: Retrieves the status of an evaluation

‘DELETE’: Deletes an evaluation run

run_parameters

A SQL OBJECT value that contains the following key:

run_name: The name of the run to perform the evaluation_job operation on.

config_file_path

A stage file path pointing to an agent evaluation configuration. This path can’t be a signed URL. For the full configuration YAML specification, see Agent Evaluation YAML specification.

Returns¶

The return value of this function depends on the evaluation_job:

‘START’ returns a single string message, indicating whether the SQL execution succeeded or failed.

‘STATUS’ returns a table containing information on the current state of the evaluation run.

‘DELETE’ returns a single string message, indicating whether the SQL execution succeeded or failed.

The table returned by the ‘STATUS’ evaluation job has the following columns:

Name	Type	Description
RUN_NAME	VARCHAR	The name of the evaluation run.
AGENT_NAME	VARCHAR	The (unqualified) name of the agent being evaluated.
AGENT_TYPE	VARCHAR	The type of agent being evaluated.
STATUS	VARCHAR	The current status of the evaluation run.
STATUS_DETAILS	ARRAY	An array of error messages that occurred during this run.

Values in the STATUS column are one of:

Run status

Status	Description
CREATED	The run has been created but not started.
INVOCATION_IN_PROGRESS	The run invocation is in the process of generating the output and the traces.
INVOCATION_COMPLETED	The run invocation completed with all outputs and traces created.
INVOCATION_PARTIALLY_COMPLETED	The run invocation is partially completed due to failures in application invocation and trace generation.
COMPUTATION_IN_PROGRESS	The metric computation is in progress.
COMPLETED	The metric computation is completed with detailed outputs and traces.
PARTIALLY_COMPLETED	The run is partially completed due to failures during the metric computation.
CANCELLED	The run has been cancelled.

Access control requirements¶

For the full access control requirements to conduct a Cortex Agent evaluation, see Cortex Agent evaluations – Access control requirements.

Examples¶

The following example starts a run called run-1 using the agent evaluation configuration from @eval_db.eval_schema.metrics/agent_evaluation_config.yaml:

CALL EXECUTE_AI_EVALUATION(
  'START',
  OBJECT_CONSTRUCT('run_name', 'run-1'),
  '@eval_db.eval_schema.metrics/agent_evaluation_config.yaml'
);

The following example queries the status of the evaluation run run-1 using the agent configuration from @eval_db.eval_schema.metrics/agent_evaluation_config.yaml:

CALL EXECUTE_AI_EVALUATION(
  'STATUS',
  OBJECT_CONSTRUCT('run_name', 'run-1'),
  '@eval_db.eval_schema.metrics/agent_evaluation_config.yaml'
);

The following example deletes the evaluation run run-1 using the agent configuration from @eval_db.eval_schema.metrics/agent_evaluation_config.yaml:

CALL EXECUTE_AI_EVALUATION(
  'DELETE',
  OBJECT_CONSTRUCT('run_name', 'run-1'),
  '@eval_db.eval_schema.metrics/agent_evaluation_config.yaml'
);