Work with files in Snowflake Notebooks¶
This topic describes how you can upload and access files from your Snowflake Notebooks.
Files in notebook environments¶
When you create a new notebook, two files are created. You can view these in the Files pane on the left side of the notebook. The files are stored in an internal stage which represents your notebook environment. Files stored in this internal stage persist between sessions.
Main notebook file: By default, this is named
notebook_app.ipynb
. If your notebook is created from Git or uploaded from another.ipynb
file, the filename may be different.environment.yml: This is an autogenerated file that describes your notebook environment, such as which packages are installed.
To inspect the contents of the file, you can select the file name and a pop up will appear with a preview of the file content. Note that files are read-only. To modify the contents of a file, you will need to download it, edit it locally and then upload the updated copy.
Temporary filesystem in a notebook environment¶
Your notebook has a temporary filesystem that is available during an active session. Any files created during the session are saved in this temporary stage. Files on the temporary stage will not be available after you exit out of the current notebook session.
The following code creates a file called myfile.txt
and writes some text in it:
with open("myfile.txt",'w') as f:
f.write("abc")
f.close()
You can access this file during the same session it was created.
Use the listdir()
method to list the files in the temporary stage:
import os
os.listdir()
Now disconnect from your current session and reconnect. Try the listdir()
method again and myfile.txt
file will not be listed.
Files persisted across Notebook sessions¶
To persist your files across Notebook sessions:
Store Files in a Snowflake stage¶
If you want your files to persist between sessions and reference the files across different notebooks, use a Snowflake stage to store them. You can upload files from your local machine onto the stage and use file operations from Snowpark API to access them from your notebook.
Example¶
This example shows how to create a stage and store and retrieve files from it from your notebook.
To create a stage called permanent_stage
, run the follow code in a SQL cell:
CREATE OR REPLACE STAGE permanent_stage;
Next, to create a file called myfile.txt
with some text in it, run the following code in a Python cell:
with open("myfile.txt",'w') as f:
f.write("abc")
f.close()
Note that at this point, myfile.txt
is stored in the Notebook’s temporary filesystem.
To move this to the stage, you can use Snowpark API to upload the myfile.txt
to your permanent_stage
:
from snowflake.snowpark.context import get_active_session
session = get_active_session()
put_result = session.file.put("myfile.txt","@PERMANENT_STAGE", auto_compress= False)
put_result[0].status
If you disconnect your session and reconnect, you can run the following code in a SQL cell to see that the file is still there:
LS @permanent_stage;
Add Files to Notebook from local computer¶
You can upload files from your local computer to be used in your Snowflake notebook.
Sign in to Snowsight.
Select Projects » Notebooks.
In the Files tab, next to the database object explorer, select the icon to select files to upload.
Browse and select or drag and drop files into the dialog.
Select Upload to upload your file.
Uploaded files are saved to the notebook’s internal stage and persisted between sessions. You can reference uploaded files using their local paths from the notebook file. See Referencing Files in Notebooks.
Warning
If your notebook session is active when you uploaded the file, you will need to restart your notebook session for the uploaded file to be accessible. This is a known bug. Snowflake recommends adding all the files you need before starting your session for use in your notebook.
Sync with Files from Git¶
If your Notebook is connected to Git, then all the files in the same Git folder as your notebook will be displayed on the Files Tab.
For more information on working with files in Git, see Sync Snowflake Notebooks with a Git repository.
Referencing Files in Notebooks¶
Each file in the notebook environment has a stage path and a local path. You can use these paths to reference the file in the notebook.
Referencing local path with Python¶
In general, Python libraries uses the local path to the file as reference to the file.
For example, the following code accesses the data.csv
file that was uploaded to the same directory as the notebook that this code
is running in:
import pandas as pd
df = pd.read_csv("data.csv")
Referencing Stage path with SQL¶
With SQL, Snowflake references files based on the stage path. The stage path for a file in your notebook is based on the following format:
snow://notebook/<DATABASE>.<SCHEMA>.<NOTEBOOK_NAME>/versions/live/<file_name>
To find the stage path associated with the files in your notebook stage using the Copy path menu:
Sign in to Snowsight.
Select Projects » Notebooks.
In the Files tab, next to the database object explorer, select the icon next to the file you want to get the path for.
Select Copy path. This copies the path of the file to your clipboard.
Then you can use the following SQL statement to list the stage file details:
LIST 'snow://notebook/<DATABASE>.<SCHEMA>.<NOTEBOOK_NAME>/versions/live/data.csv'
Access control requirements¶
You need to use a role with the following privileges to access files from a stage in a notebook.
Privilege |
Object |
---|---|
USAGE |
Stage that contains the files. |
Limitations and considerations¶
Load files before starting your notebook session. If you load files after a session has started, you have to restart your session to access the files.
No restrictions on file types to upload.
The size limit per file is 250 MB or less.
Files that are written to a local path in the notebook does not show up on the Files UI. This is a known bug. However, you should still be able to use the file in your notebook code.
For example, if you create a file,
data.json
, you can access it as shown in the following code even though it won’t be visible in the Files UI:# Generate sample JSON file with open("data.json", "w") as f: f.write('{"fruit":"apple", "size":3.4, "weight":1.4},{"fruit":"orange", "size":5.4, "weight":3.2}') # Read from local JSON file (File doesn't show in UI) df = pd.read_json("data.json",lines=True) df
Opening another
.ipynb
file that is not the main notebook file is not supported.
Additional resources¶
For additional examples, see the following example notebooks on Github: