Snowpark Submit reference

With Snowpark Submit, you can use familiar Spark semantics to run non-interactive, batch-oriented Spark workloads on Snowflake.

Note

snowpark-submit supports much of the same functionality as spark-submit. However, some functionality has been omitted because it is not needed when running Spark workloads on Snowflake.

Syntax

snowpark-submit
  --name <application_name>
  --exclude-packages <package_to_exclude> [, <package_to_exclude>, ...]
  --py-files <files_to_place_on_path>
  --conf <spark_config_property=value> [<spark_config_property=value> ...]
  --properties-file <path_to_properies_file>
  --help, -h
  --verbose, -v
  --version
  --account <snowflake_account>
  --user <snowflake_user>
  --authenticator <snowflake_authenticator>
  --token-file-path <snowflake_token_file_path>
  --password <snowflake_password>
  --role <snowflake_role>
  --host <snowflake_host>
  --database <snowflake_database_name>
  --schema <snowflake_schema_name>
  --warehouse <snowflake_warehouse_name>
  --compute-pool <snowflake_compute_pool>
  --comment <comment>
  --snowflake-stage <snowflake_stage>
  --external-access-integrations <snowflake_external_access_integrations> [, ...]
  --snowflake-log-level <snowflake_log_level>
  --snowflake-workload-name <snowflake_workload_name>
  --snowflake-connection-name <snowflake_connection_name>
  --workload-status
  --display-logs
  --wait-for-completion
  <application.jar | application.py> [<application_arguments>]
Copy

Arguments

application.jar | application.py

Path to a file containing the application and dependencies.

[application arguments]

Application-specific arguments passed to the application’s main method.

Options

--name NAME

A name of your application.

--exclude-packages [EXCLUDE_PACKAGES ...]

Comma-separated list of groupId:artifactId pairs, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.

--py-files PY_FILES

Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.

--conf [PROP=VALUEPROP=VALUE ...]

Arbitrary Spark configuration property.

--properties-file FILE

Path to a file from which to load extra properties. If not specified, this will look for conf/spark-defaults.conf.

--help, -h

Show help message and exit.

--verbose, -v

Print additional debug output.

--version

Print the version of current Spark.

Snowflake specific options

--account SNOWFLAKE_ACCOUNT

Snowflake account to use. Overrides the account in the connections.toml file if specified.

--user SNOWFLAKE_USER

Snowflake user to use. Overrides the user in the connections.toml file if specified.

--authenticator SNOWFLAKE_AUTHENTICATOR

Authenticator for Snowflake login. Overrides the authenticator in the connections.toml file if specified. If not specified, defaults to user password authenticator.

--token-file-path SNOWFLAKE_TOKEN_FILE_PATH

Path to a file containing the OAuth token for Snowflake. Overrides the token file path in the connections.toml file if specified.

--password SNOWFLAKE_PASSWORD

Password for the Snowflake user. Overrides the password in the connections.toml file if specified.

--role SNOWFLAKE_ROLE

Snowflake role to use. Overrides the role in the connections.toml file if specified.

--host SNOWFLAKE_HOST

Host for snowflake deployment. Overrides the host in the connections.toml file if specified.

--database SNOWFLAKE_DATABASE_NAME

Snowflake database to be used in the session. Overrides the database in the connections.toml file if specified.

--schema SNOWFLAKE_SCHEMA_NAME

Snowflake schema to use in the session. Overrides the schema in the connections.toml file if specified.

--warehouse SNOWFLAKE_WAREHOUSE_NAME

Snowflake warehouse to use in the session. Overrides the warehouse in the connections.toml file if specified.

--compute-pool SNOWFLAKE_COMPUTE_POOL

Snowflake compute pool for running the provided workload. Overrides the compute pool in the connections.toml file if specified.

--comment COMMENT

A message associated with the workload. Can be used to identify the workload in Snowflake.

--snowflake-stage SNOWFLAKE_STAGE

Snowflake stage where workload files are uploaded.

--external-access-integrations [SNOWFLAKE_EXTERNAL_ACCESS_INTEGRATIONS ...]

Snowflake external acccess integrations required by the workload.

--snowflake-log-level SNOWFLAKE_LOG_LEVEL

Log level for Snowflake event table—'INFO', 'ERROR', 'NONE'. (Default: INFO).

--snowflake-workload-name SNOWFLAKE_WORKLOAD_NAME

Name of the workload to be run in Snowflake.

--snowflake-connection-name SNOWFLAKE_CONNECTION_NAME

Name of the connection in connections.toml file to use as the base configuration. Command-line arguments will override any values from the connections.toml file.

--workload-status

Print the detailed status of the workload.

--display-logs

Whether to print application logs to console when --workload-status is specified.

--wait-for-completion

In cluster mode, when specified, run the workload in blocking mode and wait for completion.

--requirements-file REQUIREMENTS_FILE

Path to a requirements.txt file containing Python package dependencies to install before running the workload. Requires external access integration for PyPI.

--wheel-files WHEEL_FILES

Comma-separated list of .whl files to install before running the Python workload. Used for private dependencies not available on PyPI.

Common option examples

Application deployment

Snowflake’s Snowpark Container Services (SPCS) is the primary infrastructure for running your Spark applications. You need to have created an SPCS compute pool in advance.

Basic Python application

To deploy a basic Python application in cluster mode:

snowpark-submit \
  --snowflake-workload-name MY_PYTHON_JOB \
  --snowflake-connection-name MY_CONNECTION_CONFIG_NAME
  app.py arg1 arg2
Copy

Authentication

Snowpark Submit offers various methods for authenticating with Snowflake. You must use at least one method. Connection profile and direct authentication can be used together or separately. The command-line option overrides corresponding fields in connection profile when it is also present.

Connection profile

To use a pre-configured Snowflake connection profile:

snowpark-submit \
  --snowflake-connection-name my_connection \
  --snowflake-workload-name MY_JOB \
  app.py
Copy

Direct authentication

Username and password

To provide authentication details directly in the command:

snowpark-submit \
  --host myhost \
  --account myaccount \
  --user myuser \
  --password mypassword \
  --role myrole \
  --snowflake-workload-name MY_JOB \
  app.py
Copy

OAuth

To authenticate by using an OAuth token:

snowpark-submit \
  --host myhost \
  --account myaccount \
  --authenticator oauth \
  --token-file-path /path/to/token.txt \
  --snowflake-workload-name MY_JOB \
  --compute-pool MY_COMPUTE_POOL \
  app.py
Copy

Snowflake resources

To specify the Snowflake database, schema, warehouse, and compute pool for your job:

snowpark-submit \
  --database MY_DB \
  --schema MY_SCHEMA \
  --warehouse MY_WH \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name MY_CONNECTION \
  app.py
Copy

Snowflake stages

You can use Snowpark Submit to store and access files directly on a Snowflake stage.

To submit a job using a file on a Snowflake stage:

snowpark-submit \
  --snowflake-stage @my_stage \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name MY_CONNECTION \
  @my_stage/app.py
Copy

Dependencies management

You can manage your application’s dependencies.

Python dependencies

To specify additional Python files or archives that are needed by your application:

snowpark-submit \
  --py-files dependencies.zip,module.py \
  --snowflake-workload-name MY_PYTHON_JOB \
  --snowflake-connection-name MY_CONNECTION \
  app.py
Copy

Monitoring and control

You can monitor and control your Snowpark Submit jobs effectively.

Waiting for job completion

By default, Snowpark Submit starts the job and returns immediately. To run in blocking mode and wait for the job to finish:

snowpark-submit \
  --snowflake-connection-name my_connection \
  --snowflake-workload-name MY_JOB \
  --wait-for-completion \
  app.py
Copy

The wait-for-completion flag causes the command to block until the job completes (either successfully or with failure), showing periodic status updates. This is useful for workflows where you need to ensure a job completes before proceeding with other tasks, such as when you use Apache Airflow.

Checking workload status

Check the status of a workload (running or completed).

snowpark-submit --snowflake-connection-name my_connection --snowflake-workload-name MY_JOB --workload-status
Copy

This command returns the following information about the workload:

  • Current state (DEPLOYING, RUNNING, SUCCEEDED, FAILED`)

  • Start time and duration

  • Service details

Viewing application logs

To view detailed logs along with the workload status:

snowpark-submit --snowflake-connection-name my_connection --snowflake-workload-name MY_JOB --workload-status --display-logs
Copy

The display-logs flag will fetch and print the application’s output logs to the console. Using these logs, you can perform the following tasks:

  • Debug application errors

  • Monitor execution progress

  • View application output

Note

There is a small latency—from a few seconds to a minute—for logs to be ready for fetching. When an event table is not used to store log data, logs are retained for a short period of time, such as five minutes or less.

Advanced configuration

Fine-tune your Snowpark Submit jobs with advanced configurations.

External access integration

Connect to external services from your Spark application:

snowpark-submit \
  --external-access-integrations "MY_NETWORK_RULE,MY_STORAGE_INTEGRATION" \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name my_connection \
  app.py
Copy

Logging level configuration

Control the logging level for your application to the Snowflake event table:

snowpark-submit \
  --snowflake-log-level INFO \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name MY_CONNECTION \
  app.py
Copy

Options for –snowflake-log-level: INFO, ERROR, NONE.

Adding job context

Add a descriptive comment for easier workload identification in Snowflake:

snowpark-submit \
  --comment "Daily data processing job" \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name my_connection \
  app.py
Copy