Work with files in Snowflake Notebooks¶
This topic describes how you can upload and access files from your Snowflake Notebooks.
Files in notebook environments¶
When you create a new notebook, the main notebook file is created. You can view this file in the Files tab on the left side of the notebook. Files are stored in an internal stage that represents your notebook environment, and they persist between sessions.
Main notebook file: By default, this file is named
notebook_app.ipynb
. If your notebook is created from Git or uploaded from another.ipynb
file, the filename may be different.environment.yml: This file is autogenerated after you install a new package. It describes your notebook environment, including which packages are installed.
To inspect the contents of the file, select the file name to display a preview of the file content.
Temporary filesystem in a notebook environment¶
Your notebook has a temporary filesystem that is available during an active session. Any files created during the session are saved in this temporary stage. Files on the temporary stage will not be available after you end the current notebook session.
The following code creates a file called myfile.txt
and writes some text in it:
with open("myfile.txt",'w') as f:
f.write("abc")
f.close()
You can access this file during the same session it was created.
Use the listdir()
method to list the files in the temporary stage:
import os
os.listdir()
Now disconnect from your current session and reconnect. Try the listdir()
method again and myfile.txt
file will not be listed.
Persist files across notebook sessions¶
To persist your files across notebook sessions:
Store files in a Snowflake stage¶
If you want your files to persist between sessions and reference the files across different notebooks, use a Snowflake stage to store them. You can upload files from your local machine onto the stage and use file operations from Snowpark API to access them from your notebook.
Example¶
This example shows how to create a stage and store and retrieve files from it from your notebook.
To create a stage called permanent_stage
, run the follow code in a SQL cell:
CREATE OR REPLACE STAGE permanent_stage;
Next, to create a file called myfile.txt
with some text in it, run the following code in a Python cell:
with open("myfile.txt",'w') as f:
f.write("abc")
f.close()
Note that at this point, myfile.txt
is stored in the notebook’s temporary filesystem.
To move this to the stage, you can use Snowpark API to upload the myfile.txt
to your permanent_stage
:
from snowflake.snowpark.context import get_active_session
session = get_active_session()
put_result = session.file.put("myfile.txt","@PERMANENT_STAGE", auto_compress= False)
put_result[0].status
If you disconnect your session and reconnect, you can run the following code in a SQL cell to verify whether the file still appears:
LS @permanent_stage;
Add files to a notebook from a local computer¶
You can upload files from your local computer to be used in your Snowflake notebook.
Sign in to Snowsight.
Select Projects » Notebooks.
In the Files tab, next to the database object explorer, select the icon to select files to upload.
Browse and select or drag and drop files into the dialog.
Select Upload to upload your file.
Uploaded files are saved to the notebook’s internal stage and persisted between sessions. You can reference uploaded files using their local paths from the notebook file. See Referencing files in Notebooks.
Note
For notebooks on container runtime: if your notebook session is active when you uploaded the file, you will need to restart your notebook session for the uploaded file to be accessible. Snowflake recommends adding all the files you need before starting your session for use in your notebook.
Sync with files from Git¶
If your notebook is connected to Git, then any files in the same Git folder as your notebook will be displayed in the Files tab.
For more information on working with files in Git, see Sync Snowflake Notebooks with a Git repository.
Referencing files in Notebooks¶
Each file in the notebook environment has a stage path and a local path. You can use these paths to reference the file in the notebook.
Referencing local path with Python¶
In general, Python libraries use the local path to the file as reference to the file.
For example, the following code accesses the data.csv
file that was uploaded to the same directory as the notebook that this code
is running in:
import pandas as pd
df = pd.read_csv("data.csv")
Referencing the stage path with SQL¶
With SQL, Snowflake references files based on the stage path. The stage path for a file in your notebook is based on the following format:
snow://notebook/<DATABASE>.<SCHEMA>.<NOTEBOOK_NAME>/versions/live/<file_name>
To find the stage path associated with the files in your notebook stage using the Copy path menu:
Sign in to Snowsight.
Select Projects » Notebooks.
In the Files tab, next to the database object explorer, select the icon next to the file you want to get the path for.
Select Copy path. This copies the path of the file to your clipboard.
Then you can use the following SQL statement to list the stage file details:
LIST 'snow://notebook/<DATABASE>.<SCHEMA>.<NOTEBOOK_NAME>/versions/live/data.csv'
Access control requirements¶
You need to use a role with the following privileges to access files from a stage in a notebook.
Privilege |
Object |
---|---|
USAGE |
Stage that contains the files. |
Limitations and considerations¶
Load files before starting your notebook session. If you load files after a session has started, you have to restart your session to access the files.
No restrictions on file types to upload.
The size limit per file is 250 MB or less.
Files that are written to a local path in the notebook are not displayed in the Files tab. However, you can still use the file in your notebook code.
For example, if you create a file,
data.json
, you can access it as shown in the following code even though it won’t be visible in the Files UI:# Generate sample JSON file with open("data.json", "w") as f: f.write('{"fruit":"apple", "size":3.4, "weight":1.4},{"fruit":"orange", "size":5.4, "weight":3.2}') # Read from local JSON file (File doesn't show in UI) df = pd.read_json("data.json",lines=True) df
Opening another
.ipynb
file that is not the main notebook file is not supported.