Writing Snowpark Code in Python Worksheets¶
Write Snowpark code in Python worksheets to process data using Snowpark Python in Snowsight. By writing code in Python worksheets, you can perform your development and testing in Snowflake without needing to install dependent libraries.
To develop with Python worksheets, do the following:
For example, you might write code in a Python worksheet that extracts data from stages or database objects in Snowflake, transforms the data, and stores the transformed data in Snowflake. You could then convert that code to a stored procedure and build a data pipeline, all without leaving Snowflake.
About Python Worksheets¶
Python worksheets let you use Snowpark Python in Snowsight to perform data manipulations and transformations. You can use packages available in Anaconda or import your own Python files from stages to use in scripts.
After running a Python worksheet, review the results and output returned by your script. The results display as a string, variant, or a table, depending on your code. Refer to Running Python Worksheets.
Note
Because Python worksheets run inside Snowflake rather than in your local development environment, you cannot use session.add_import
to add a file that your Python code depends on, or session.add_packages
or session.add_requirements
to add packages that you need
to use in your Python code. Instead, you add those files to a stage and reference them in your code.
Refer to Add a Python File from a Stage to a Worksheet.
Python worksheets have the following limitations:
Log levels lower than WARN do not appear in the Output for a Python worksheet by default. Instead, use a logging library such as the
logging
module to set the level of messages logged. All messages that you log appear in the output.No support for breakpoints or running only portions of the Python code in a worksheet.
No support for images or webpages. Images or webpages generated by Python code cannot be displayed in Python worksheets.
Snowpark Python uses Python 3.8.
If you require support for any of these options, consider using your local development environment instead. Refer to Setting Up Your Development Environment for Snowpark Python.
Prerequisites for Python Worksheets¶
To use Python worksheets, you must do the following:
Review and accept the Anaconda Terms of Service in Snowsight. Refer to Getting Started.
(Optional) Add Python files and packages that are not included with Anaconda that you want to use in a Python worksheet to a named stage. Refer to Add a Python File from a Stage to a Worksheet.
Choose a warehouse to use for Python worksheets. Snowflake recommends using an X-Small warehouse for development. If you’re running a very large Snowpark workload, use a Snowpark-optimized warehouse. Refer to Warehouse Size for additional details about warehouse sizes.
Add a Python File from a Stage to a Worksheet¶
Snowflake includes the Anaconda packages from the Snowflake conda channel in Python worksheets. If you want to use other Python files or packages than those included in Anaconda in your Python worksheet, you must upload the files to a named stage in Snowflake and then add them to the list of packages for your Python worksheet.
To use a Python package in your worksheet that is not included by default, do the following:
Sign in to Snowsight.
Open Worksheets.
Select + » Python Worksheet.
Select a database and schema.
Select Packages » Stage Packages.
Enter the path to the package in the stage.
If the selected database and schema for the worksheet contain the stage where the package is located, you can reference the stage using an unqualified name. For example,
@YourStage/path/to/package.py
.To reference a stage in a different database and schema, fully qualify the name of the stage. For example,
@Database.Schema.Stage/path/to/package.py
.
Select Import to add your package to the list of installed packages. You can then use
import
statements to use the package in your Python worksheet.
Note
Packages that you add to a worksheet are available only to that worksheet. If you want to use the same package in a different Python worksheet, follow these steps to add the package to that worksheet.
For more details, refer to Making Dependencies Available to Your Code.
Start Developing with Python Worksheets¶
To open a worksheet and configure your development environment, do the following:
Sign in to Snowsight.
Open Worksheets.
Select + » Python Worksheet.
Select a database and schema.
Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected.
Python worksheets require a running warehouse to load Python packages and run Python code.
(Optional) Select Packages to add Python libraries.
Packages included with Anaconda, such as numpy, pandas, requests, and urllib3, are already installed.
Search for other packages that you want to install and use in your worksheet, such as scikit-learn. Select the package in the search results and optionally modify the package version selected by default. Packages installed by you appear at the top of the list of packages.
Add your own packages and Python files by selecting Stage Packages and specifying the file path of the stage and package, then selecting Import.
If you add Python libraries to your worksheet, update your code to use
import
statements to import the libraries into your worksheet.Run the sample Python code to validate your configuration.
Error messages or the return value from your code appears in the Results section. To view log messages, select Output. Refer to Running Python Worksheets.
Writing Snowpark Code in Python Worksheets¶
After you follow the steps to start developing with Python worksheets, you can replace the sample code with your own.
Write your Snowpark Python code inside the handler function:
import snowflake.snowpark as snowpark
def main(session: snowpark.Session):
# your code goes here
The default handler function is main
, but you can change it in the Settings for the worksheet.
Use the session
object provided in the boilerplate code to access data in Snowflake with the Snowpark API libraries.
For example, you can create a DataFrame for a table or execute a SQL
statement. Refer to the Snowpark Developer Guide for Python.
As you type, you see autocomplete for Python methods, defined variables, database objects, and more. You do not see autocomplete for some third-party packages or files imported from a stage. Python worksheets also include syntax highlighting and guidance for method parameters. You can enable linting and line wrapping in the Settings for the worksheet.
Return Results of a Different Data Type¶
When you write your Python code, consider which type of data is returned by the return
statement in your code and adjust how the
worksheet returns results. By default, a Python worksheet has a return type of Table() because the placeholder code returns a DataFrame.
Depending on what your Python code returns, you might want to change the worksheet settings to display the output differently:
If your handler function returns a
DataFrame
, use the default return type of Table().If your handler function returns a list of
Row
objects, such as with thecollect
method, change the return type to Variant.If your handler function returns a string, such as
return "Hello Python"
, or a value that you want to cast as a string, change the return type to String.If your handler function returns an integer, such as with the
count
method, use a return type of Variant or String.
For details about the return type of some DataFrame methods, refer to Performing an Action to Evaluate a DataFrame.
To update the worksheet settings to return results of a different type, do the following:
Sign in to Snowsight.
Open Worksheets.
Open the Python worksheet for which you want to display the results as a table.
Select a warehouse to use to run the worksheet. If you have a default warehouse for your user, it is pre-selected. Make sure your warehouse is running.
Select Settings and for the Return Type, select the type returned by the handler function.
Run your Python worksheet.
Review the results in the Results panel.
Running Python Worksheets¶
After you write your Python worksheet, select Run to run your Python worksheet. Running your worksheet executes all of the code in your Python worksheet. Partial or incremental execution of code is not supported.
Review Output Generated by Your Code¶
You can review standard output (stdout) or standard error (stderr) messages for your Python code in the Output panel for a Python worksheet.
You can see the output from the following types of functions in the Output panel:
Functions that write to the console, such as
print()
.Functions that print a DataFrame, such as the
show
method of the DataFrame class in Snowpark Python.
Note
Output appears after all Python processes finish running, rather than appearing in a stream as the code runs.
Log output is written to a temporary stage and is only captured if the following are true:
You select a database and schema for the worksheet.
The selected database was not created from a share.
You run the worksheet using a role that has USAGE privileges on the selected database and schema.
Review the Query History for a Python Worksheet¶
When a Python worksheet runs in Snowsight, an anonymous stored procedure runs the code and generates queries that execute the Snowpark commands in the code.
You can use the Query History page in Snowsight to review the queries that ran. Refer to Query History.
For example, after running a worksheet, you can review the queries that ran by doing the following:
Review the Results of the worksheet.
In the Query Details for the worksheet, select
» Copy Query ID
Select Worksheets to return to the list of worksheets.
Select Activity » Query History.
On the Query History page, display only the queries from your Python worksheet:
Select Filters, and enable the Query ID option.
Enter the Query ID of your Python worksheet.
Select Apply Filters.
Review the queries run for the worksheet.