Snowpark Submit examples¶
This topic includes examples that use Snowpark Submit to submit production-ready Spark applications.
Write and submit a simple Spark application¶
The following example shows how to write and submit a simple Spark application with no dependencies.
In your local IDE, create a new Python file called
app.pywith the following content:To submit the application, use the following command:
You can use the
--wait-for-completionoption to wait for the job to complete, the--workload-statusoption to check the status of the job, and the--display-logsoption to display the logs of the job. For a complete list of options, see Snowpark Submit reference.
Deploy an application from a Snowflake stage¶
If the application has dependencies, like files it needs to read, you can deploy them from a Snowflake stage. The following example shows how to deploy an application and its dependencies from a Snowflake stage.
To upload files to a stage from the terminal, you can use the Snowflake CLI. Note that SnowSQL is the legacy CLI and if you are already using it, you can use that as well to upload files to a stage. If you have not already installed the Snowflake CLI, you can install it by following the instructions in Installing Snowflake CLI.
Create a new CSV file in your local IDE called
sample_employees.csvwith the following content:Upload your dependency files to a stage by using the following command, where
my_stageis the name of a stage in your account. (If you do not have a stage created, you can use [snow stage create](/developer-guide/snowflake-cli/command-reference/stage-commands/create).)To verify that the file uploaded successfully, you can use the following command to list the files in the stage:
You should see the file
sample_employees.csvin the list.In your local IDE, create a new Python file called
app.pywith the following content:To submit the application which uses the files you uploaded to the stage, use the following command:
Note that a compute pool is required to run the application and must be either specified in the
connections.tomlfile or on the command line using the--compute-pooloption. For more information, see Snowpark Submit reference.
Monitor with wait and logs¶
The following example shows how to submit a job, wait for its completion, and then retrieve logs.
Submit the job and wait for completion by using the following command:
If the job fails, check the detailed logs by using the following command:
Use Snowpark Submit in an Apache Airflow DAG¶
You can submit a Spark job to Snowflake via Snowpark Connect for Spark. You can use snowpark-submit in cluster mode to leverage a compute pool to run the job.
When you use Apache Airflow in this way, ensure that the Docker service or Snowpark Container Services container that runs Apache Airflow has proper access to Snowflake and the required files in the Snowflake stage.
The code in the following example performs the following tasks:
Creates a Python virtual environment at
/tmp/myenv.In the
create_venvtask, the code usespipto install thesnowpark-submitpackage by using a.whlfile.Generates a secure
connections.tomlfile with Snowflake connection credentials and an OAuth token.In the
create_connections_tomltask, the code creates the/app/.snowflakedirectory, creates the.tomlfile, and then changes file permissions to allow only the owner (user) to have read and write access.Runs a Spark job by using the snowpark-submit command.
In the
run_snowpark_scripttask, the code does the following things:Activates the virtual environment.
Runs the Spark job by using the snowpark-submit command.
Deploys to Snowflake by using cluster mode.
Uses the Snowpark Connect for Spark remote URI sc://localhost:15002.
Specifies the Spark application class
org.example.SnowparkConnectApp.Pulls the script from the @snowflake_stage stage.
Blocks deployment until the job finishes by using
--wait-for-completion.
You can monitor the DAG by using the Apache Airflow user interface’s Graph View or Tree View. Inspect the task logs for the following items:
Environment setup
Status of Snowpark Connect for Spark
snowpark-submit job output
You can also monitor for jobs that ran in Snowflake from the logs stored in Snowflake stage or from event tables.