Unstructured data types¶
Snowflake offers native support for unstructured data, meaning that Snowflake treats the data as an opaque binary object, without any inherent knowledge of the data’s internal structure. This is in contrast to structured data, which follows a strict tabular schema (for example, a CSV file), or semi-structured data, which has a flexible schema (for example, JSON or XML files).
The FILE data type represents a file stored in an internal or external stage and can be used with any kind of data: including documents, images, and audio and video files. Some Snowflake Cortex AI functions accept FILE values as input. FILE objects are created using the TO_FILE or TRY_TO_FILE function.
FILE data type¶
Snowflake provides the FILE data type for unstructured data. A FILE value represents a file stored in an internal or external stage, but does not store the file’s data, only a reference to it. It includes the following metadata:.
STAGE: The name of the stage on which the file resides.
RELATIVE_PATH: The relative path of the file in its stage.
STAGE_FILE_URL: The stage file URL.
SCOPED_FILE_URL: A scoped file URL.
CONTENT_TYPE: The MIME type of the file.
SIZE: The size, in bytes, of the file.
ETAG: A unique hash of the file contents.
LAST_MODIFIED: The timestamp at which the file was last modified.
Not all of these fields are required. A valid FILE is required only to have CONTENT_TYPE, SIZE, ETAG, LAST_MODIFIED and some way of locating the file (that is, STAGE plus RELATIVE_PATH, STAGE_FILE_URL, or SCOPED_FILE_URL).
FILE data type functions¶
Sub-category |
Function |
---|---|
Constructor |
|
Accessors |
|
Utility Functions |
|
Usage notes¶
FILE values provide no consistency guarantees with regard to the underlying stage files. FILE values are not updated when the underlying stage file is deleted or modified. If a FILE value is deleted from a table, the underlying file is not affected.
Permissions on the underlying files are governed by the type of URL that was specified when creating the FILE. Stage file URLs and stage/path combinations give permanent permission to callers that have access to the associated stage. Scoped URLs give temporary user-based access to the underlying file.
Using unstructured data in Snowflake via SQL¶
Create a table with a FILE column.
CREATE TABLE images_table(img FILE);
Load data from an external stage my_images
that contains image files. mpy_images
can be an internal or external
stage.
Note
This process requires directory table support on the stage. Use the SQL below to enable it if it is not already enabled.
ALTER STAGE my_images DIRECTORY=(ENABLE=true);
Load data into the Snowflake table.
INSERT INTO images_table
SELECT TO_FILE(file_url) FROM DIRECTORY(@my_images);
Run SQL statements against images_table
. For example, the following query returns the relative path of each file in
the table that was last modified between January 1, 2021 and January 1, 2023.
SELECT FL_GET_RELATIVE_PATH(f)
FROM images_table
WHERE FL_GET_LAST_MODIFIED(f) BETWEEN '2021-01-01' and '2023-01-01';
Known limitations¶
The FILE data type currently cannot be used in:
CLUSTER BY, GROUP BY, and ORDER BY clauses
Hybrid tables, Iceberg tables, and external tables
SnowScript
Secured views
Binds
Search optimization
Clients and connectors except Snowpark Python