snowflake.snowpark.Session.directory¶
- Session.directory(stage_name: str) DataFrame [source]¶
Returns a DataFrame representing the results of a directory table query on the specified stage.
A directory table query retrieves file-level metadata about the data files in a Snowflake stage. This includes information like relative path, file size, last modified timestamp, file URL, and checksums.
Note
The stage must have a directory table enabled for this method to work. The query is executed lazily, which means the SQL is not executed until methods like
DataFrame.collect()
orDataFrame.to_pandas()
evaluate the DataFrame.- Parameters:
stage_name – The name of the stage to query. The stage name should not include the ‘@’ prefix as it will be added automatically.
- Returns:
RELATIVE_PATH
: Path to the files to access using the file URLSIZE
: Size of the file in bytesLAST_MODIFIED
: Timestamp when the file was last updated in the stageMD5
: MD5 checksum for the fileETAG
: ETag header for the fileFILE_URL
: Snowflake file URL to access the file
- Return type:
A DataFrame containing metadata about files in the stage with the following columns
- Examples::
>>> # Get all file metadata from a stage named 'test_stage' >>> _ = session.sql("CREATE OR REPLACE TEMP STAGE test_stage DIRECTORY = (ENABLE = TRUE)").collect() >>> _ = session.file.put("tests/resources/testCSV.csv", "@test_stage", auto_compress=False) >>> _ = session.file.put("tests/resources/testJson.json", "@test_stage", auto_compress=False) >>> _ = session.sql("ALTER STAGE test_stage REFRESH").collect()
>>> # List all files in the stage >>> df = session.directory('test_stage') >>> df.count() 2
>>> # Get file URLs for CSV files only >>> csv_files = session.directory('test_stage').filter( ... col('RELATIVE_PATH').like('%.csv%') ... ).select('RELATIVE_PATH') >>> csv_files.show() ------------------- |"RELATIVE_PATH" | ------------------- |testCSV.csv | -------------------
For details, see the Snowflake documentation on Snowflake Directory Tables Documentation