Introduction to Unstructured Data Support

Unstructured data is information that does not fit into a predefined data model or schema. Typically text-heavy, such as form responses and social media conversations, unstructured data also encompasses images, video, and audio. Industry-specific file types such as VCF (genomics), KDF (semiconductors), or HDF5 (aeronautics) are included in this category.

Snowflake supports the following actions:

  • Securely access data files located in cloud storage.

  • Share file access URLs with collaborators and partners.

  • Load file access URLs and other file metadata into Snowflake tables.

This topic introduces key concepts and provides instructions for accessing and sharing unstructured data files.

Cloud Storage Service Support

Both external (external cloud storage) and internal (i.e. Snowflake) stages support unstructured data.

External stages

Store files in external cloud storage: Amazon S3, Google Cloud Storage, or one of the supported Microsoft Azure cloud storage services:

  • Blob storage

  • Data Lake Storage Gen2

  • General-purpose v1

  • General-purpose v2

Types of URLs Available to Access Files

The following types of URLs are available to access files in cloud storage:

Scoped URL

Encoded URL that permits temporary access to a staged file without granting privileges to the stage.

The URL expires when the persisted query result period ends (i.e. the results cache expires), which is currently 24 hours.

File URL

URL that identifies the database, schema, stage, and file path to a set of files. A role that has sufficient privileges on the stage can access the files.

Pre-signed URL

Simple HTTPS URL used to access a file via a web browser. A file is temporarily accessible to users via this URL using a pre-signed access token. The expiration time for the access token is configurable.

The following table describes key characteristics of these URL types:

Scoped URL

File URL

Pre-signed URL

Use cases

Recommended for file administrators to give scoped access to data files to specific roles in the same account. Provide access to the files with a view that retrieves scoped URLs. Only roles that have privileges on the view can access the files. Snowflake records information in the query history about who uses a scoped URL to access a file, and when. Ideal for use in custom applications, for providing unstructured data to other accounts through a share, or for downloading and analysis of unstructured data in Snowsight.

Permanent URL to a file on a stage. To download or access a file, users send the file URL in a GET request to the REST API endpoint along with the authorization token. Ideal for custom applications that require access to unstructured data files.

Used to download or access files without authenticating into Snowflake or passing an authorization token. Pre-signed URLs are open; any user or application can directly access or download the files. Ideal for business intelligence applications or reporting tools that need to display the unstructured file contents.

How to generate

Query the BUILD_SCOPED_FILE_URL function.

Either Query the directory table for the stage that references the staged files or call the BUILD_STAGE_FILE_URL function.

Query the GET_PRESIGNED_URL function.

Usage

The following options are available:

  • In Snowsight, click on a scoped URL in the query results table. Snowsight retrieves the file only for the user who generated the scoped URL.

  • Send a scoped URL in a GET request to the file support REST API endpoint. For information, see REST API for Unstructured Data Support.

The following options are available:

  • In Snowsight, click on a file URL in the query results table. Snowsight retrieves the file only if the active role has sufficient privileges.

  • Send a file URL in a GET request to the file support REST API endpoint. For information, see REST API for Unstructured Data Support.

The following options are available:

  • In Snowsight, click on a pre-signed URL in the query results table.

  • Navigate to the pre-signed URL directly in a web browser.

Snowflake Secure Data Sharing

Unstructured data files can be accessed by data consumers via column values of this type in secure views shared by data providers.

Unstructured data files cannot be accessed by data consumers via column values of this type in secure views shared by data providers.

Unstructured data files can be accessed by data consumers via column values of this type in secure views shared by data providers.

Authorization

Only the user who generates a scoped URL can use the URL to access the referenced file.

Role specified in the GET REST API call must have sufficient privileges on the stage: USAGE (external stage) or READ (internal stage).

Any person who has the pre-signed URL can access the referenced file for the life of the token.

Expiration

Expiration period for the query results cache (currently 24 hours).

Permanent.

Length of time specified in the expiration_time argument.

Server-side Encryption Only for Stages

To support file access using pre-signed URLs, create named internal or external stages with server-side encryption rather than client-side encryption. When files in a stage are client-side encrypted, users cannot read the staged files without having access to the encryption key.

Note

Currently, creating internal stages with server-side encryption is limited to the following Snowflake client versions:

  • JDBC Driver v3.12.11 (or higher)

Directory Tables

Directory tables store a catalog of staged files in cloud storage. Roles with sufficient privileges can query a directory table to retrieve file URLs to access the staged files.

For details, see Directory Tables.

SQL Functions

The following File Functions are provided to access data files:

SQL Function

Description

GET_STAGE_LOCATION

Returns the URL for an external or internal named stage using the stage name as the input.

GET_RELATIVE_PATH

Extracts the path of a staged file relative to its location in the stage using the stage name and absolute file path in cloud storage as inputs.

GET_ABSOLUTE_PATH

Returns the absolute path of a staged file using the stage name and path of the file relative to its location in the stage as inputs.

GET_PRESIGNED_URL

Generates the pre-signed URL to a staged file using the stage name and relative file path as inputs. Access files in an external stage using the function.

BUILD_SCOPED_FILE_URL

Generates a scoped Snowflake-hosted URL to a staged file using the stage name and relative file path as inputs.

BUILD_STAGE_FILE_URL

Generates a Snowflake-hosted file URL to a staged file using the stage name and relative file path as inputs.

Downloading Staged Files in Snowsight

Users can click on a generated scoped, pre-signed, or file URL in the results table of a Snowsight worksheet and download the referenced file.

  1. Log into Snowsight, the Snowflake web interface.

  2. Click Worksheets » My Worksheets (or open a local worksheet under Recent or in Folders) » <worksheet_name>.

  3. Return a scoped, pre-signed, or file URL in a query using any one of the supported methods.

  4. Click on the URL in the results table. Snowsight downloads the file referenced by the URL.

    Download pre-signed URL in Snowsight results table