- Categories:
System functions (Control)
EXECUTE_ AI_ EVALUATION¶
Start or get the status of a Cortex Agent evaluation run.
For more information on Cortex Agent evaluations, see Cortex Agent evaluations.
Syntax¶
Arguments¶
evaluation_jobOne of the following values:
- ‘START’: Starts an evaluation
- ‘STATUS’: Retrieves the status of an evaluation
run_parametersA SQL OBJECT value that contains the following key:
run_name: The name of the run to perform theevaluation_joboperation on.
config_file_pathA stage file path pointing to an agent evaluation configuration. This path can’t be a signed URL. For the full configuration YAML specification, see Agent Evaluation YAML specification.
Returns¶
The return value of this function depends on the evaluation_job:
- ‘START’ returns a single string message, indicating whether the SQL execution succeeded or failed.
- ‘STATUS’ returns a table containing information on the current state of the evaluation run.
The table returned by the ‘STATUS’ evaluation job has the following columns:
| Name | Type | Description |
|---|---|---|
| RUN_NAME | VARCHAR | The name of the evaluation run. |
| AGENT_NAME | VARCHAR | The (unqualified) name of the agent being evaluated. |
| AGENT_TYPE | VARCHAR | The type of agent being evaluated. |
| STATUS | VARCHAR | The current status of the evaluation run. |
| STATUS_DETAILS | ARRAY | An array of error messages that occurred during this run. |
Values in the STATUS column are one of:
Run status
| Status | Description |
|---|---|
| CREATED | The run has been created but not started. |
| INVOCATION_IN_PROGRESS | The run invocation is in the process of generating the output and the traces. |
| INVOCATION_COMPLETED | The run invocation completed with all outputs and traces created. |
| INVOCATION_PARTIALLY_COMPLETED | The run invocation is partially completed due to failures in application invocation and trace generation. |
| COMPUTATION_IN_PROGRESS | The metric computation is in progress. |
| COMPLETED | The metric computation is completed with detailed outputs and traces. |
| PARTIALLY_COMPLETED | The run is partially completed due to failures during the metric computation. |
| CANCELLED | The run has been cancelled. |
Access control requirements¶
For the full access control requirements to conduct a Cortex Agent evaluation, see Cortex Agent evaluations – Access control requirements.
Examples¶
The following example starts a run called run-1 using the agent evaluation configuration from @eval_db.eval_schema.metrics/agent_evaluation_config.yaml:
The following example queries the status of the evaluation run run-1 using the agent configuration from @eval_db.eval_schema.metrics/agent_evaluation_config.yaml: