Snowpark Submit reference¶
With Snowpark Submit, you can use familiar Spark semantics to run non-interactive, batch-oriented Spark workloads on Snowflake.
Note
snowpark-submit supports much of the same functionality as spark-submit. However, some functionality has been omitted because it is not needed when running Spark workloads on Snowflake.
Syntax¶
snowpark-submit
--name <application_name>
--exclude-packages <package_to_exclude> [, <package_to_exclude>, ...]
--py-files <files_to_place_on_path>
--conf <spark_config_property=value> [<spark_config_property=value> ...]
--properties-file <path_to_properies_file>
--help, -h
--verbose, -v
--version
--account <snowflake_account>
--user <snowflake_user>
--authenticator <snowflake_authenticator>
--token-file-path <snowflake_token_file_path>
--password <snowflake_password>
--role <snowflake_role>
--host <snowflake_host>
--database <snowflake_database_name>
--schema <snowflake_schema_name>
--warehouse <snowflake_warehouse_name>
--compute-pool <snowflake_compute_pool>
--comment <comment>
--snowflake-stage <snowflake_stage>
--external-access-integrations <snowflake_external_access_integrations> [, ...]
--snowflake-log-level <snowflake_log_level>
--snowflake-workload-name <snowflake_workload_name>
--snowflake-connection-name <snowflake_connection_name>
--snowflake-grpc-max-message-size <message_size>
--snowflake-grpc-max-metadata-size <metadata_size>
--workload-status
--display-logs
--wait-for-completion
<application.jar | application.py> [<application_arguments>]
Arguments¶
application.jar | application.pyPath to a file containing the application and dependencies.
[application arguments]Application-specific arguments passed to the application’s main method.
Options¶
--conf [PROP=VALUEPROP=VALUE ...]Arbitrary Spark configuration property.
--exclude-packages [EXCLUDE_PACKAGES ...]Comma-separated list of groupId:artifactId pairs, to exclude while resolving the dependencies provided in
--packagesto avoid dependency conflicts.--help, -hShow help message and exit.
--name NAME
Name of your application.
--properties-file FILEPath to a file from which to load extra properties. If not specified, this will look for conf/spark-defaults.conf.
--py-files PY_FILESComma-separated list of
.zip,.egg, or.pyfiles to place on the PYTHONPATH for Python apps.--verbose, -vPrint additional debug output.
--versionPrint the version of current Spark.
Snowflake specific options¶
--account SNOWFLAKE_ACCOUNTSnowflake account to use. Overrides the account in the
connections.tomlfile if specified.--authenticator SNOWFLAKE_AUTHENTICATORAuthenticator for Snowflake login. Overrides the authenticator in the
connections.tomlfile if specified. If not specified, defaults to user password authenticator.--comment COMMENTA message associated with the workload. Can be used to identify the workload in Snowflake.
--compute-pool SNOWFLAKE_COMPUTE_POOLSnowflake compute pool for running the provided workload. Overrides the compute pool in the
connections.tomlfile if specified.--database SNOWFLAKE_DATABASE_NAMESnowflake database to be used in the session. Overrides the database in the
connections.tomlfile if specified.--display-logsWhether to print application logs to console when
--workload-statusis specified.--external-access-integrations [SNOWFLAKE_EXTERNAL_ACCESS_INTEGRATIONS ...]Snowflake external acccess integrations required by the workload.
--host SNOWFLAKE_HOSTHost for snowflake deployment. Overrides the host in the
connections.tomlfile if specified.--password SNOWFLAKE_PASSWORDPassword for the Snowflake user. Overrides the password in the
connections.tomlfile if specified.--requirements-file REQUIREMENTS_FILEPath to a
requirements.txtfile containing Python package dependencies to install before running the workload. Requires external access integration for PyPI. This parameter will not function if you also specify the--snowflake-stageparameter.--role SNOWFLAKE_ROLESnowflake role to use. Overrides the role in the
connections.tomlfile if specified.--schema SNOWFLAKE_SCHEMA_NAMESnowflake schema to use in the session. Overrides the schema in the
connections.tomlfile if specified.--snowflake-connection-name SNOWFLAKE_CONNECTION_NAMEName of the connection in
connections.tomlfile to use as the base configuration. Command-line arguments override any values from theconnections.tomlfile.--snowflake-grpc-max-message-size MESSAGE_SIZEMaximum message size, in bytes, for gRPC communication in Snowpark Submit.
--snowflake-grpc-max-metadata-size METADATA_SIZEMaximum metadata size, in bytes, for gRPC communication in Snowpark Submit.
--snowflake-log-level SNOWFLAKE_LOG_LEVELLog level for Snowflake event table—
'INFO','ERROR','NONE'. (Default: INFO).--snowflake-stage SNOWFLAKE_STAGESnowflake stage where workload files are uploaded.
--snowflake-workload-name SNOWFLAKE_WORKLOAD_NAMEName of the workload to be run in Snowflake.
--token-file-path SNOWFLAKE_TOKEN_FILE_PATHPath to a file containing the OAuth token for Snowflake. Overrides the token file path in the
connections.tomlfile if specified.--user SNOWFLAKE_USERSnowflake user to use. Overrides the user in the
connections.tomlfile if specified.--wait-for-completionIn cluster mode, when specified, run the workload in blocking mode and wait for completion.
--warehouse SNOWFLAKE_WAREHOUSE_NAMESnowflake warehouse to use in the session. Overrides the warehouse in the
connections.tomlfile if specified.--wheel-files WHEEL_FILESComma-separated list of .whl files to install before running the Python workload. Used for private dependencies not available on PyPI.
--workload-statusPrint the detailed status of the workload.
Common option examples¶
Application deployment¶
Snowflake’s Snowpark Container Services (SPCS) is the primary infrastructure for running your Spark applications. You need to have created an SPCS compute pool in advance.
Basic Python application¶
To deploy a basic Python application in cluster mode:
snowpark-submit \
--snowflake-workload-name MY_PYTHON_JOB \
--snowflake-connection-name MY_CONNECTION_CONFIG_NAME
app.py arg1 arg2
Authentication¶
Snowpark Submit offers various methods for authenticating with Snowflake. You must use at least one method. Connection profile and direct authentication can be used together or separately. The command-line option overrides corresponding fields in connection profile when it is also present.
Connection profile¶
To use a pre-configured Snowflake connection profile:
snowpark-submit \
--snowflake-connection-name my_connection \
--snowflake-workload-name MY_JOB \
app.py
Direct authentication¶
Username and password¶
To provide authentication details directly in the command:
snowpark-submit \
--host myhost \
--account myaccount \
--user myuser \
--password mypassword \
--role myrole \
--snowflake-workload-name MY_JOB \
app.py
OAuth¶
To authenticate by using an OAuth token:
snowpark-submit \
--host myhost \
--account myaccount \
--authenticator oauth \
--token-file-path /path/to/token.txt \
--snowflake-workload-name MY_JOB \
--compute-pool MY_COMPUTE_POOL \
app.py
Snowflake resources¶
To specify the Snowflake database, schema, warehouse, and compute pool for your job:
snowpark-submit \
--database MY_DB \
--schema MY_SCHEMA \
--warehouse MY_WH \
--snowflake-workload-name MY_JOB \
--snowflake-connection-name MY_CONNECTION \
app.py
Snowflake stages¶
You can use Snowpark Submit to store and access files directly on a Snowflake stage.
To submit a job using a file on a Snowflake stage:
snowpark-submit \
--snowflake-stage @my_stage \
--snowflake-workload-name MY_JOB \
--snowflake-connection-name MY_CONNECTION \
@my_stage/app.py
Dependencies management¶
You can manage your application’s dependencies.
Python dependencies¶
To specify additional Python files or archives that are needed by your application:
snowpark-submit \
--py-files dependencies.zip,module.py \
--snowflake-workload-name MY_PYTHON_JOB \
--snowflake-connection-name MY_CONNECTION \
app.py
Monitoring and control¶
You can monitor and control your Snowpark Submit jobs effectively.
Waiting for job completion¶
By default, Snowpark Submit starts the job and returns immediately. To run in blocking mode and wait for the job to finish:
snowpark-submit \
--snowflake-connection-name my_connection \
--snowflake-workload-name MY_JOB \
--wait-for-completion \
app.py
The wait-for-completion flag causes the command to block until the job completes (either successfully or with failure), showing
periodic status updates. This is useful for workflows where you need to ensure a job completes before proceeding with other tasks,
such as when you use Apache Airflow.
Checking workload status¶
Check the status of a workload (running or completed).
snowpark-submit --snowflake-connection-name my_connection --snowflake-workload-name MY_JOB --workload-status
This command returns the following information about the workload:
Current state (
DEPLOYING,RUNNING,SUCCEEDED,FAILED)Start time and duration
Service details
Viewing application logs¶
To view detailed logs along with the workload status:
snowpark-submit --snowflake-connection-name my_connection --snowflake-workload-name MY_JOB --workload-status --display-logs
The display-logs flag will fetch and print the application’s output logs to the console. Using these logs, you can perform the
following tasks:
Debug application errors
Monitor execution progress
View application output
Note
There is a small latency—from a few seconds to a minute—for logs to be ready for fetching. When an event table is not used to store log data, logs are retained for a short period of time, such as five minutes or less.
Advanced configuration¶
Fine-tune your Snowpark Submit jobs with advanced configurations.
External access integration¶
Connect to external services from your Spark application:
snowpark-submit \
--external-access-integrations "MY_NETWORK_RULE,MY_STORAGE_INTEGRATION" \
--snowflake-workload-name MY_JOB \
--snowflake-connection-name my_connection \
app.py
Logging level configuration¶
Control the logging level for your application to the Snowflake event table:
snowpark-submit \
--snowflake-log-level INFO \
--snowflake-workload-name MY_JOB \
--snowflake-connection-name MY_CONNECTION \
app.py
Options for –snowflake-log-level: INFO, ERROR, NONE.
Adding job context¶
Add a descriptive comment for easier workload identification in Snowflake:
snowpark-submit \
--comment "Daily data processing job" \
--snowflake-workload-name MY_JOB \
--snowflake-connection-name my_connection \
app.py