Unstructured data types

Snowflake offers native support for unstructured data, meaning that Snowflake treats the data as an opaque binary object, without any inherent knowledge of the data’s internal structure. This is in contrast to structured data, which follows a strict tabular schema (for example, a CSV file), or semi-structured data, which has a flexible schema (for example, JSON or XML files).

The FILE data type represents a file stored in an internal or external stage and can be used with any kind of data: including documents, images, and audio and video files. Some Snowflake Cortex AI functions accept FILE values as input. FILE objects are created using the TO_FILE or TRY_TO_FILE function.

FILE data type

Snowflake provides the FILE data type for unstructured data. A FILE value represents a file stored in an internal or external stage, but does not store the file’s data, only a reference to it. It includes the following metadata:.

  • STAGE: The name of the stage on which the file resides.

  • RELATIVE_PATH: The relative path of the file in its stage.

  • STAGE_FILE_URL: The stage file URL.

  • SCOPED_FILE_URL: A scoped file URL.

  • CONTENT_TYPE: The MIME type of the file.

  • SIZE: The size, in bytes, of the file.

  • ETAG: A unique hash of the file contents.

  • LAST_MODIFIED: The timestamp at which the file was last modified.

Not all of these fields are required. A valid FILE is required only to have CONTENT_TYPE, SIZE, ETAG, LAST_MODIFIED and some way of locating the file (that is, STAGE plus RELATIVE_PATH, STAGE_FILE_URL, or SCOPED_FILE_URL).

FILE data type functions

Sub-category

Function

Constructor

TO_FILE

TRY_TO_FILE

Accessors

FL_GET_CONTENT_TYPE

FL_GET_ETAG

FL_GET_FILE_TYPE

FL_GET_LAST_MODIFIED

FL_GET_RELATIVE_PATH

FL_GET_SCOPED_FILE_URL

FL_GET_SIZE

FL_GET_STAGE

FL_GET_STAGE_FILE_URL

Utility Functions

FL_IS_AUDIO

FL_IS_COMPRESSED

FL_IS_DOCUMENT

FL_IS_IMAGE

FL_IS_VIDEO

Usage notes

  • FILE values provide no consistency guarantees with regard to the underlying stage files. FILE values are not updated when the underlying stage file is deleted or modified. If a FILE value is deleted from a table, the underlying file is not affected.

  • Permissions on the underlying files are governed by the type of URL that was specified when creating the FILE. Stage file URLs and stage/path combinations give permanent permission to callers that have access to the associated stage. Scoped URLs give temporary user-based access to the underlying file.

Using unstructured data in Snowflake via SQL

Create a table with a FILE column.

CREATE TABLE images_table(img FILE);
Copy

Load data from an external stage my_images that contains image files. mpy_images can be an internal or external stage.

Note

This process requires directory table support on the stage. Use the SQL below to enable it if it is not already enabled.

ALTER STAGE my_images DIRECTORY=(ENABLE=true);
Copy

Load data into the Snowflake table.

INSERT INTO images_table
    SELECT TO_FILE(file_url) FROM DIRECTORY(@my_images);
Copy

Run SQL statements against images_table. For example, the following query returns the relative path of each file in the table that was last modified between January 1, 2021 and January 1, 2023.

SELECT FL_GET_RELATIVE_PATH(f)
    FROM images_table
    WHERE FL_GET_LAST_MODIFIED(f) BETWEEN '2021-01-01' and '2023-01-01';
Copy

Known limitations

The FILE data type currently cannot be used in:

  • CLUSTER BY, GROUP BY, and ORDER BY clauses

  • Hybrid tables, Iceberg tables, and external tables

  • SnowScript

  • Secured views

  • Binds

  • Search optimization

  • Clients and connectors except Snowpark Python