Troubleshoot errors in Snowflake Notebooks¶
The following scenarios can help you troubleshoot issues that can occur when using Snowflake Notebooks.
Total number of notebooks exceeds the limit¶
The following error occurs when the total number of notebooks in your account exceeds 6000 and you refresh the Notebooks list:
Result size for streamlit list exceeded the limit. Streamlit list was truncated.
Users can still create new notebooks; however, Snowflake recommend that you remove notebooks that are no longer being used by the account.
snowflake.core package conflict¶
To use Snowpark Python APIs, use snowflake.core
from the package selector. If you add both snowflake.core
and snowflake
, but put
them on different versions, a package conflict error is returned.
Notebooks (warehouse runtime) error when updating a package¶
Snowflake has deprecated the older snowflake-ml
package, which is no longer supported. It has been removed from the package selector and is
not available in the Snowflake Anaconda channel. If you are using snowflake-ml
and try to add, remove, or update packages in your
notebooks, those notebooks will fail because snowflake-ml
is no longer accessible.
To avoid issues, switch to snowflake-ml-python
, which is the correct package for Snowflake ML.
AttributeError: NoneType
¶
The following error occurs when a cell is renamed to the same name as an existing variable in the notebook:
AttributeError: ‘NoneType’ object has no attribute ‘sql’
For example, you have the following in a Python cell called cell1
:
session = get_active_session() #establishing a Snowpark session
If you then rename cell2
to “session”, and reference “session” in cell3
, Notebooks attempts to reference “session” (the cell
name) and not the Snowpark session, causing an error.
Early disconnection¶
The notebook session runs as a stored procedure. If your notebook is unexpectedly disconnecting before the 1 hour timeout, your ACCOUNTADMIN or the warehouse owner could have set the STATEMENT_TIMEOUT_IN_SECONDS parameter to a particular value (for example, 5 mins), which limits how long all statements can run on the warehouse, including notebook sessions. This parameter is set at the warehouse or account level and when it is set for both a warehouse and a session, the lowest non-zero value is enforced. To allow the notebook to run longer, you can use the default warehouse SYSTEM$STREAMLIT_NOTEBOOK$WAREHOUSE or change the STATEMENT_TIMEOUT_IN_SECONDS parameter to a longer duration.
Unable to connect due to firewall¶
The following popup occurs when you try to start your notebook:
Something went wrong. Unable to connect. A firewall or ad blocker might be preventing you from connecting.
Ensure that *.snowflake.app
is on the allowlist in your network and can connect to Snowflake.
When this domain is on the allowlist, your apps can communicate with Snowflake servers without
any restrictions.
In addition, to prevent any issues connecting to the Snowflake backend, ensure that WebSockets are not blocked in your network configuration.
No active warehouse selected¶
To resolve this error, specify a warehouse for the session with the USE WAREHOUSE command or select a warehouse in your notebook. For steps on how to select a warehouse for your notebook, see Warehouse recommendations for running Snowflake Notebooks.
Additionally, you’ll see this error if you’re using a role that doesn’t have privileges to access the warehouse, database, or schema that the notebook is using. You need to switch to a role that has access to these resources, so that you can continue your work.
Missing packages¶
The following message occurs in a cell output if you’re trying to use a package that is not installed in your notebook environment:
ModuleNotFoundError: Line 2: Module Not Found: snowflake.core. To import packages from Anaconda, install them first using the package
selector at the top of the page.
Import the necessary package by following the instructions on the Import Python packages to use in notebooks page.
Missing package from existing notebook¶
New versions of notebooks are continually being released and notebooks are auto-upgraded to the latest version. Sometimes, when upgrading an old notebook, the packages in the notebook environment aren’t compatible with the upgrade. This could possibly cause the notebook to fail to start.
The following is an example of an error message when the Libpython
package is missing:
SnowflakeInternalException{signature=std::vector<sf::RuntimePathLinkage> sf::{anonymous}::buildRuntimeFileSet(const sf::UdfRuntime&, std::string_view, const std::vector<sf::udf::ThirdPartyLibrariesInfo>&, bool):"libpython_missing", internalMsg=[XP_WORKER_FAILURE: Unexpected error signaled by function 'std::vector<sf::RuntimePathLinkage> sf::{anonymous}::buildRuntimeFileSet(const sf::UdfRuntime&, std::string_view, const std::vector<sf::udf::ThirdPartyLibrariesInfo>&, bool)'
Assert "libpython_missing"[{"function": "std::vector<sf::RuntimePathLinkage> sf::{anonymous}::buildRuntimeFileSet(const sf::UdfRuntime&, std::string_view, const std::vector<sf::udf::ThirdPartyLibrariesInfo>&, bool)", "line": 1307, "stack frame ptr": "0xf2ff65553120", "libPythonOnHost": "/opt/sfc/deployments/prod1/ExecPlatform/cache/directory_cache/server_2921757878/v3/python_udf_libs/.data/4e8f2a35e2a60eb4cce3538d6f794bd7881d238d64b1b3e28c72c0f3d58843f0/lib/libpython3.9.so.1.0"}]], userMsg=Processing aborted due to error 300010:791225565; incident 9770775., reporter=unknown, dumpFile= file://, isAborting=true, isVerbose=false}
To resolve this error, try the following steps:
Refresh the webpage and start the notebook again.
If the issue persists, the next step is to open the package picker and check if all installed packages are still valid. In the drop down for each package, you can see the available versions. Selecting the latest version of the package usually clears the error.
Read-only file system issue¶
Some Python libraries download or cache data to a local user directory. However, the default user directory /home/udf
is read-only.
To work around this, set the path as /tmp
which is a writable location.
Note that the environment variable used to set the write directory may vary depending on which library you are using.
The following is a list of known libraries that present this issue:
matplotlib
HuggingFace
catboost
matplotlib example¶
The following is the warning you get when you try to use matplotlib:
Matplotlib created a temporary cache directory at /tmp/matplotlib-2fk8582w because the default path (/home/udf/.config/matplotlib) is
not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular
to speed up the import of Matplotlib and to better support multiprocessing.
The following code sets the MPLCONFIGDIR
variable to /tmp/
to resolve this warning:
import os
os.environ["MPLCONFIGDIR"] = '/tmp/'
import matplotlib.pyplot as plt
Huggingface example¶
The following is the warning returned when you try to use Huggingface:
Readonly file system: `/home/udf/.cache`
The following code sets the HF_HOME
and SENTENCE_TRANSFORMERS_HOME
variables to /tmp/
to get rid of this error:
import os
os.environ['HF_HOME'] = '/tmp'
os.environ['SENTENCE_TRANSFORMERS_HOME'] = '/tmp'
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Snowflake/snowflake-arctic-embed-xs")
Output message is too large when using df.collect()
¶
The following message is displayed in the cell output when you run df.collect()
:
MessageSizeError: Data of size 522.0 MB exceeds the message size limit of 200.0 MB.
This is often caused by a large chart or dataframe. Please decrease the amount of data sent to the browser,
or increase the limit by setting the config option server.maxMessageSize.
Click here to learn more about config options.
Note that increasing the limit may lead to long loading times and large memory consumption of the client's browser and the Streamlit server.
Snowflake Notebooks automatically truncates results in the cell output for large datasets in following cases:
All SQL cell results.
Python cell results if it’s a
snowpark.Dataframe
.
The issue with the above cell is that df.collect()
returns a List
instead of snowpark.Dataframe
. Lists are not automatically
truncated. To get around this issue, directly output the results of the DataFrame.
df
Notebook crashes when using df.to_pandas()
on Snowpark DataFrames¶
When running df.to_pandas()
, all the data is loaded into memory and may result in the Notebook session terminating if the data size
exceeds the associated Notebook warehouse’s memory limit.
data = session.table("BIG_TABLE")
df = data.to_pandas() # This may lead to memory error
In general, for large datasets, Snowflake recommends avoiding the use of df.to_pandas()
. Instead, to operate on your data with pandas, use
the Snowpark pandas API. The Snowpark pandas API
lets you run your pandas code directly on your data in Snowflake with the compute pushed down to SQL, so that you aren’t restricted to only
work on data that fits in memory.
The example below shows how you can rewrite the code to read in the table with Snowpark pandas.
# Import Snowpark pandas
import modin.pandas as pd
import snowflake.snowpark.modin.plugin
# Create a Snowpark pandas DataFrame from BIG_TABLE
df = pd.read_snowflake("BIG_TABLE")
# Keep working with your data using the pandas API
df.dropna()
For more details, see Snowpark pandas in notebooks.