Snowpark Container Services: Working with jobs

Important

The Snowpark Container Services job feature is currently in private preview and is subject to Preview Terms at https://snowflake.com/legal. Contact your Snowflake representative for more information.

Snowpark Container Services enables you to easily deploy, manage, and scale your containerized application as a service or a job. This topic explains working with jobs. A job has a finite lifespan, similar to a stored procedure. When all of the job’s application containers exit, the job is considered complete.

Executing jobs

To help you deploy your application as a job, Snowpark Container Services provides the EXECUTE SERVICE command. The job runs synchronously; it is completed after all of its containers exit. You need to provide the following information:

  • A job specification: This specification provides Snowflake with the information needed to run your job. The specification is a YAML file that you upload to your Snowflake stage.

  • A compute pool: Snowflake runs your job in the specified compute pool.

Example

EXECUTE SERVICE
  IN COMPUTE POOL tutorial_compute_pool
  FROM @tutorial_stage
  SPECIFICATION_FILE='my_job_spec.yaml';
Copy

The output includes the query ID of the job (a Snowflake-assigned UUID):

+------------------------------------------------------------------------------------+
|                      status                                                        |
-------------------------------------------------------------------------------------+
| Job 01af7ee6-0001-cb52-0020-c5870077223a completed successfully with status: DONE. |
+------------------------------------------------------------------------------------+

You use this query job ID with SYSTEM$GET_JOB_STATUS to get the job status, and with SYSTEM$GET_JOB_LOGS to get logs from the job containers.

Using the EXECUTE SERVICE command is similar to executing any other SQL statements. You can use the Snowsight web interface or SQL to get a job listing from query history.

Obtaining the job UUID

To debug your job execution, you can use Snowflake-provided system functions. For example, you can monitor a job using SYSTEM$GET_JOB_STATUS and access the job container log using SYSTEM$GET_JOB_LOGS. Both of these system functions require the job UUID (query ID of the job), which you can obtain as follows:

  • After the job is complete: For short jobs, EXECUTE SERVICE is completed quickly and you get the job UUID in the output. You can also call LAST_QUERY_ID immediately after EXECUTE SERVICE to capture the job UUID.

    EXECUTE SERVICE
    IN COMPUTE POOL tutorial_compute_pool
    FROM @tutorial_stage
    SPECIFICATION_FILE='my_job_spec.yaml';
    
    SET job_id = LAST_QUERY_ID();
    
    Copy

    Note that LAST_QUERY_ID can provide a job UUID only after the job completes, making it suitable primarily for short jobs.

  • During job execution: For long-running jobs, if you are interested in real-time job status information while the job is still running, you can obtain the job UUID as follows:

    • In the Snowsight web interface: When you call EXECUTE SERVICE, the Snowsight web interface returns the job UUID in the Results window immediately, while the job is still running.

    • With SnowSQL CLI:

      1. Open a new terminal window running a new SnowSQL CLI instance.

      2. Use the QUERY HISTORY family of table functions to obtain the query ID of the job. In your new session, call QUERY_HISTORY_BY_USER to get the query ID of the running job:

        SET job_id = (SELECT QUERY_ID FROM TABLE(information_schema. query_history_by_user())
          WHERE QUERY_TYPE='EXECUTE_SERVICE'
          ORDER BY start_time DESC
          LIMIT 1);
        
        Copy

Canceling a job

You can use SYSTEM$CANCEL_JOB system functions to cancel a running job. If you call SYSTEM$CANCEL_QUERY to cancel a query that triggered a job, both the query and the job are canceled.

When a job is canceled, all the job containers stop running and exit.

Monitoring a job

You can use the QUERY_HISTORY family of table functions to query Snowflake query history. For example, use QUERY_HISTORY_BY_USER to find running jobs. In the query, specify EXECUTE_SERVICE as the query type.

SELECT QUERY_ID FROM TABLE(information_schema. query_history_by_user())
  WHERE QUERY_TYPE='EXECUTE_SERVICE'
    AND EXECUTION_STATUS='RUNNING'
  ORDER BY start_time DESC
Copy

Use SYSTEM$GET_JOB_STATUS to get the detailed runtime status of a job. The job status can indicate whether the job is still running or failed to start, or if it failed, why. Because a job has no name, use the Snowflake-assigned query job ID (job UUID) when calling this system function.

CALL SYSTEM$GET_JOB_STATUS('01ab9c76-0000-3c97-0000-0e9900000000');
Copy

The following sample output is for a job with one container.

  • This output shows that the job was successful:

    [
       {
             "status":"DONE",
             "message":"Completed successfully",
             "containerName":"main",
             "instanceId":"0",
             "serviceName":"01af7ee6-0001-cb52-0020-c5870077223a",
             "image":"orgname-acctname.registry.snowflakecomputing.com/tutorial_db/data_schema/tutorial_repository/my_job_image:tutorial",
             "restartCount":0,
             "startTime":""
       }
    ]
    
    Copy
  • This output fragment shows that the job failed:

    [
       {
          "status":"FAILED",
          "message":"Encountered fatal error while running, check container logs",
          "containerName":"main",
          "instanceId":"0",
          ...
       }
    ]
    
    Copy

You can also use SYSTEM$GET_JOB_LOGS to access container logs. If your code outputs useful logs to standard output or standard error, the log can help you identify issues.

The instanceId will always be 0. You can run multiple instances of a service, but only one job instance can be running at a time.

SYSTEM$GET_JOB_STATUS takes an optional timeout_secs parameter.

  • If timeout_secs is not specified or is specified with value 0, the function immediately returns the current status.

  • If timeout_secs is specified, Snowflake waits until the job reaches a terminal state (DONE or FAILED) within the specified time before returning the job status. If the job does not reach the terminal state within the specified time, Snowflake returns the current state at the end of the specified time interval.

Example

CALL SYSTEM$GET_JOB_STATUS('01ab9c76-0000-3c97-0000-0e9900000000', 10);
Copy

Your job can be run in multiple containers (as defined in the job specification file). Accordingly, the get_job_status result includes a list of objects and provides the status for each container.

Accessing container logs

Snowflake collects whatever your code in a container outputs to standard output or standard error. You should ensure that your code outputs useful information to debug a job.

Snowflake provides two ways to access these container logs:

  • Using the SYSTEM$GET_JOB_LOGS system function: This function gives access to logs from a specific container. After a container exits, you can continue to access the logs using the system function for a short time. System functions are most useful during initial development and testing. For more information, see Using SYSTEM$GET_JOB_LOGS

  • Using an event table: An event table gives you access to logs from multiple containers across all services. Snowflake persists the logs in the event table for later access. Event tables are best used for the retrospective analysis of services and jobs. For more information, see Using an event table.

Using SYSTEM$GET_JOB_LOGS

You provide the job ID, container name, and optionally the number of most recent log lines to retrieve. For example, the following SYSTEM$GET_JOB_LOGS function, for the given job ID, retrieves the 10 most recent log lines from a container named main:

CALL SYSTEM$GET_JOB_LOGS('01ab9c76-0000-3c97-0000-0e990009102e', 'main', 10);
Copy

Sample outputs:

  • Sample container log of a successful job:

    job-tutorial - INFO - Connection succeeded. Current session context: database="TUTORIAL_DB", schema="DATA_SCHEMA", warehouse="TUTORIAL_WAREHOUSE", role="TEST_ROLE"
    job-tutorial - INFO - Executing query [select current_time() as time,'hello'] and writing result to table [results]
    job-tutorial - INFO - Job finished
    
  • Sample container log of a failed job:

    job-tutorial - INFO - Job started
    usage: main.py [-h] --query QUERY --result_table RESULT_TABLE
    main.py: error: the following arguments are required: --query
    

    This indicates that a required argument was not provided.

If you don’t know the container name, you can first run GET_JOB_STATUS to get information about running containers.

SYSTEM$GET_JOB_LOGS output has the following limitations:

  • It merges standard output and standard error streams, which makes it impossible to distinguish between regular output and error messages.

  • It reports the captured data for a specific job container.

  • It only reports logs for a running container.

  • The function returns up to 100 KB of data.

Using an event table

Snowflake can capture and record standard output and standard errors from your containers into the event table configured for your account. For more information, see Logging and Tracing Overview. For example, the following SELECT query retrieves Snowpark Container Services service and job events recorded in the past hour:

SELECT TIMESTAMP, RESOURCE_ATTRIBUTES, RECORD_ATTRIBUTES, VALUE
  FROM <current-event-table-for-your-account>
  WHERE timestamp > dateadd(hour, -1, current_timestamp())
    AND RESOURCE_ATTRIBUTES:"snow.executable.type" = 'SnowparkContainers'
  ORDER BY timestamp DESC
  LIMIT 10;
Copy

Snowflake recommends that you include a timestamp in the WHERE clause of event table queries, as shown in this example. This is particularly important because of the potential volume of data generated by various Snowflake components. By applying filters, you can retrieve a smaller subset of data, which improves query performance.

The event table columns offer useful information regarding the logs collected by Snowflake from your container:

  • TIMESTAMP: This column shows when Snowflake collected the log.

  • RESOURCE_ATTRIBUTES: This column specifies which Snowflake job and job container generated the log. It furnishes details such as the job UUID, container name, and compute pool name.

    {
       "snow.containers.compute_pool.id":549816068,
       "snow.containers.compute_pool.name":"TUTORIAL_COMPUTE_POOL",
       "snow.containers.container.name":"main",
       "snow.containers.instance.name":"0",
       "snow.containers.restart.id":"a78230",
       "snow.database.id":549816076,
       "snow.database.name":"TUTORIAL_DB",
       "snow.executable.id":980991975,
       "snow.executable.name":"01af8425-0001-cb01-0020-c58700758ca6",
       "snow.executable.type":"SnowparkContainers",
       "snow.schema.id":549816076,
       "snow.schema.name":"DATA_SCHEMA"
    }
    
    Copy
  • RECORD_ATTRIBUTES: For a job, this column identifies an error source (standard output or standard error). For example:

    {
      "log.iostream": "stdout"
    }
    
    Copy
  • VALUE: In this coluumn, standard output and standard errors are broken into lines, and each line generates a record in the event table.

Configuring an event table

For more information, see Logging and Tracing Overview.

Privileges

The user who created a job can monitor and get runtime status for the job. Jobs do not support granting privileges to other users or roles.