Snowpark Submit reference¶

With Snowpark Submit, you can use familiar Spark semantics to run non-interactive, batch-oriented Spark workloads on Snowflake.

Note

snowpark-submit supports much of the same functionality as spark-submit. However, some functionality has been omitted because it is not needed when running Spark workloads on Snowflake.

Syntax¶

snowpark-submit
  --name <application_name>
  --exclude-packages <package_to_exclude> [, <package_to_exclude>, ...]
  --py-files <files_to_place_on_path>
  --conf <spark_config_property=value> [<spark_config_property=value> ...]
  --properties-file <path_to_properies_file>
  --help, -h
  --verbose, -v
  --version
  --account <snowflake_account>
  --user <snowflake_user>
  --authenticator <snowflake_authenticator>
  --token-file-path <snowflake_token_file_path>
  --password <snowflake_password>
  --role <snowflake_role>
  --host <snowflake_host>
  --database <snowflake_database_name>
  --schema <snowflake_schema_name>
  --warehouse <snowflake_warehouse_name>
  --compute-pool <snowflake_compute_pool>
  --comment <comment>
  --snowflake-stage <snowflake_stage>
  --external-access-integrations <snowflake_external_access_integrations> [, ...]
  --snowflake-log-level <snowflake_log_level>
  --snowflake-workload-name <snowflake_workload_name>
  --snowflake-connection-name <snowflake_connection_name>
  --snowflake-grpc-max-message-size <message_size>
  --snowflake-grpc-max-metadata-size <metadata_size>
  --workload-status
  --display-logs
  --wait-for-completion
  <application.jar | application.py> [<application_arguments>]

Copy

Arguments¶

application.jar | application.py: Path to a file containing the application and dependencies.
[application arguments]: Application-specific arguments passed to the application’s main method.

Options¶

--conf [PROP=VALUEPROP=VALUE ...]: Arbitrary Spark configuration property.
--exclude-packages [EXCLUDE_PACKAGES ...]: Comma-separated list of groupId:artifactId pairs, to exclude while resolving the dependencies provided in --packages to avoid dependency conflicts.
--help, -h: Show help message and exit.

--name NAME Name of your application.

--properties-file FILE: Path to a file from which to load extra properties. If not specified, this will look for conf/spark-defaults.conf.
--py-files PY_FILES: Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.
--verbose, -v: Print additional debug output.
--version: Print the version of current Spark.

Snowflake specific options¶

--account SNOWFLAKE_ACCOUNT: Snowflake account to use. Overrides the account in the connections.toml file if specified.
--authenticator SNOWFLAKE_AUTHENTICATOR: Authenticator for Snowflake login. Overrides the authenticator in the connections.toml file if specified. If not specified, defaults to user password authenticator.
--comment COMMENT: A message associated with the workload. Can be used to identify the workload in Snowflake.
--compute-pool SNOWFLAKE_COMPUTE_POOL: Snowflake compute pool for running the provided workload. Overrides the compute pool in the connections.toml file if specified.
--database SNOWFLAKE_DATABASE_NAME: Snowflake database to be used in the session. Overrides the database in the connections.toml file if specified.
--display-logs: Whether to print application logs to console when --workload-status is specified.
--external-access-integrations [SNOWFLAKE_EXTERNAL_ACCESS_INTEGRATIONS ...]: Snowflake external acccess integrations required by the workload.
--host SNOWFLAKE_HOST: Host for snowflake deployment. Overrides the host in the connections.toml file if specified.
--password SNOWFLAKE_PASSWORD: Password for the Snowflake user. Overrides the password in the connections.toml file if specified.
--requirements-file REQUIREMENTS_FILE: Path to a requirements.txt file containing Python package dependencies to install before running the workload. Requires external access integration for PyPI. This parameter will not function if you also specify the --snowflake-stage parameter.
--role SNOWFLAKE_ROLE: Snowflake role to use. Overrides the role in the connections.toml file if specified.
--schema SNOWFLAKE_SCHEMA_NAME: Snowflake schema to use in the session. Overrides the schema in the connections.toml file if specified.
--snowflake-connection-name SNOWFLAKE_CONNECTION_NAME: Name of the connection in connections.toml file to use as the base configuration. Command-line arguments override any values from the connections.toml file.
--snowflake-grpc-max-message-size MESSAGE_SIZE: Maximum message size, in bytes, for gRPC communication in Snowpark Submit.
--snowflake-grpc-max-metadata-size METADATA_SIZE: Maximum metadata size, in bytes, for gRPC communication in Snowpark Submit.
--snowflake-log-level SNOWFLAKE_LOG_LEVEL: Log level for Snowflake event table—'INFO', 'ERROR', 'NONE'. (Default: INFO).
--snowflake-stage SNOWFLAKE_STAGE: Snowflake stage where workload files are uploaded.
--snowflake-workload-name SNOWFLAKE_WORKLOAD_NAME: Name of the workload to be run in Snowflake.
--token-file-path SNOWFLAKE_TOKEN_FILE_PATH: Path to a file containing the OAuth token for Snowflake. Overrides the token file path in the connections.toml file if specified.
--user SNOWFLAKE_USER: Snowflake user to use. Overrides the user in the connections.toml file if specified.
--wait-for-completion: In cluster mode, when specified, run the workload in blocking mode and wait for completion.
--warehouse SNOWFLAKE_WAREHOUSE_NAME: Snowflake warehouse to use in the session. Overrides the warehouse in the connections.toml file if specified.
--wheel-files WHEEL_FILES: Comma-separated list of .whl files to install before running the Python workload. Used for private dependencies not available on PyPI.
--workload-status: Print the detailed status of the workload.

Common option examples¶

Application deployment¶

Snowflake’s Snowpark Container Services (SPCS) is the primary infrastructure for running your Spark applications. You need to have created an SPCS compute pool in advance.

Basic Python application¶

To deploy a basic Python application in cluster mode:

snowpark-submit \
  --snowflake-workload-name MY_PYTHON_JOB \
  --snowflake-connection-name MY_CONNECTION_CONFIG_NAME
  app.py arg1 arg2

Copy

Authentication¶

Snowpark Submit offers various methods for authenticating with Snowflake. You must use at least one method. Connection profile and direct authentication can be used together or separately. The command-line option overrides corresponding fields in connection profile when it is also present.

Connection profile¶

To use a pre-configured Snowflake connection profile:

snowpark-submit \
  --snowflake-connection-name my_connection \
  --snowflake-workload-name MY_JOB \
  app.py

Copy

Direct authentication¶

Username and password¶

To provide authentication details directly in the command:

snowpark-submit \
  --host myhost \
  --account myaccount \
  --user myuser \
  --password mypassword \
  --role myrole \
  --snowflake-workload-name MY_JOB \
  app.py

Copy

OAuth¶

To authenticate by using an OAuth token:

snowpark-submit \
  --host myhost \
  --account myaccount \
  --authenticator oauth \
  --token-file-path /path/to/token.txt \
  --snowflake-workload-name MY_JOB \
  --compute-pool MY_COMPUTE_POOL \
  app.py

Copy

Snowflake resources¶

To specify the Snowflake database, schema, warehouse, and compute pool for your job:

snowpark-submit \
  --database MY_DB \
  --schema MY_SCHEMA \
  --warehouse MY_WH \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name MY_CONNECTION \
  app.py

Copy

Snowflake stages¶

You can use Snowpark Submit to store and access files directly on a Snowflake stage.

To submit a job using a file on a Snowflake stage:

snowpark-submit \
  --snowflake-stage @my_stage \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name MY_CONNECTION \
  @my_stage/app.py

Copy

Dependencies management¶

You can manage your application’s dependencies.

Python dependencies¶

To specify additional Python files or archives that are needed by your application:

snowpark-submit \
  --py-files dependencies.zip,module.py \
  --snowflake-workload-name MY_PYTHON_JOB \
  --snowflake-connection-name MY_CONNECTION \
  app.py

Copy

Monitoring and control¶

You can monitor and control your Snowpark Submit jobs effectively.

Waiting for job completion¶

By default, Snowpark Submit starts the job and returns immediately. To run in blocking mode and wait for the job to finish:

snowpark-submit \
  --snowflake-connection-name my_connection \
  --snowflake-workload-name MY_JOB \
  --wait-for-completion \
  app.py

Copy

The wait-for-completion flag causes the command to block until the job completes (either successfully or with failure), showing periodic status updates. This is useful for workflows where you need to ensure a job completes before proceeding with other tasks, such as when you use Apache Airflow.

Checking workload status¶

Check the status of a workload (running or completed).

snowpark-submit --snowflake-connection-name my_connection --snowflake-workload-name MY_JOB --workload-status

Copy

This command returns the following information about the workload:

Current state (DEPLOYING, RUNNING, SUCCEEDED, FAILED)
Start time and duration
Service details

Viewing application logs¶

To view detailed logs along with the workload status:

snowpark-submit --snowflake-connection-name my_connection --snowflake-workload-name MY_JOB --workload-status --display-logs

Copy

The display-logs flag will fetch and print the application’s output logs to the console. Using these logs, you can perform the following tasks:

Debug application errors
Monitor execution progress
View application output

Note

There is a small latency—from a few seconds to a minute—for logs to be ready for fetching. When an event table is not used to store log data, logs are retained for a short period of time, such as five minutes or less.

Advanced configuration¶

Fine-tune your Snowpark Submit jobs with advanced configurations.

External access integration¶

Connect to external services from your Spark application:

snowpark-submit \
  --external-access-integrations "MY_NETWORK_RULE,MY_STORAGE_INTEGRATION" \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name my_connection \
  app.py

Copy

Logging level configuration¶

Control the logging level for your application to the Snowflake event table:

snowpark-submit \
  --snowflake-log-level INFO \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name MY_CONNECTION \
  app.py

Copy

Options for –snowflake-log-level: INFO, ERROR, NONE.

Adding job context¶

Add a descriptive comment for easier workload identification in Snowflake:

snowpark-submit \
  --comment "Daily data processing job" \
  --snowflake-workload-name MY_JOB \
  --snowflake-connection-name my_connection \
  app.py

Copy