Introduction to Unstructured Data Support

This topic introduces key concepts and provides instructions for accessing and sharing unstructured data files.

In this Topic:

Cloud Storage Service Support

Both external (external cloud storage) and internal (i.e. Snowflake) stages support unstructured data.

External stages

Store files in external cloud storage: Amazon S3, Google Cloud Storage, or one of the supported Microsoft Azure cloud storage services:

  • Blob storage

  • Data Lake Storage Gen2

  • General-purpose v1

  • General-purpose v2

Types of URLs Available to Access Files

The following types of URLs are available to access files in cloud storage:

Scoped URL

Encoded URL that permits temporary access to a staged file without granting privileges to the stage.

The URL expires when the persisted query result period ends (i.e. the results cache expires), which is currently 24 hours.

File URL

URL that identifies the database, schema, stage, and file path to a set of files. A role that has sufficient privileges on the stage can access the files.

Pre-signed URL

Simple HTTPS URL used to access a file via a web browser. A file is temporarily accessible to users via this URL using a pre-signed access token. The expiration time for the access token is configurable.

The following table describes key characteristics of these URL types:

Scoped URL

File URL

Pre-signed URL

Use cases

Recommended for file administrators to give scoped access to data files to specific roles in the same account. Provide access to the files via a view that retrieves scoped URLs. Only roles that have privileges on the view can access the files. Ideal for use in custom applications, providing unstructured data to other accounts via a share, or for downloading and ad hoc analysis of unstructured data via Snowsight.

Permanent URL to a file on a stage. To download or access a file, users send the file URL in a GET request to the REST API endpoint along with the authorization token. Ideal for custom applications that require access to unstructured data files.

Used to download or access files without authenticating into Snowflake or passing an authorization token. Pre-signed URLs are open; any user or application can directly access or download the files. Ideal for business intelligence applications or reporting tools that need to display the unstructured file contents.

How to generate

Query the BUILD_SCOPED_FILE_URL function.

Either Query the directory table for the stage that references the staged files or call the BUILD_STAGE_FILE_URL function.

Query the GET_PRESIGNED_URL function.

Usage

The following options are available:

  • In Snowsight, click on a scoped URL in the query results table. Snowsight retrieves the file only for the user who generated the scoped URL.

  • Send a scoped URL in a GET request to the file support REST API endpoint. For information, see REST API for Unstructured Data Support.

The following options are available:

  • In Snowsight, click on a file URL in the query results table. Snowsight retrieves the file only if the active role has sufficient privileges.

  • Send a file URL in a GET request to the file support REST API endpoint. For information, see REST API for Unstructured Data Support.

The following options are available:

  • In Snowsight, click on a pre-signed URL in the query results table.

  • Navigate to the pre-signed URL directly in a web browser.

Data sharing

Data providers can share scoped URLs with data consumers as column values in secure views.

Not supported.

Data providers can share pre-signed URLs with data consumers as column values in secure views.

Authorization

Only the user who generates a scoped URL can use the URL to access the referenced file.

Role specified in the GET REST API call must have sufficient privileges on the stage: USAGE (external stage) or READ (internal stage).

Any person who has the pre-signed URL can access the referenced file for the life of the token.

Expiration

Expiration period for the query results cache (currently 24 hours).

Permanent.

Length of time specified in the expiration_time argument.

Server-side Encryption Only for Stages

To support file access using pre-signed URLs, create named internal or external stages with server-side encryption rather than client-side encryption. When files in a stage are client-side encrypted, users cannot read the staged files without having access to the encryption key.

Note

Currently, creating internal stages with server-side encryption is limited to the following Snowflake client versions:

  • JDBC Driver v3.12.11 (or higher)

Directory Tables

Directory tables store a catalog of staged files in cloud storage. Roles with sufficient privileges can query a directory table to retrieve file URLs to access the staged files.

For details, see Directory Tables.

SQL Functions

The following File Functions are provided to access data files:

SQL Function

Description

GET_STAGE_LOCATION

Returns the URL for an external or internal named stage using the stage name as the input.

GET_RELATIVE_PATH

Extracts the path of a staged file relative to its location in the stage using the stage name and absolute file path in cloud storage as inputs.

GET_ABSOLUTE_PATH

Returns the absolute path of a staged file using the stage name and path of the file relative to its location in the stage as inputs.

GET_PRESIGNED_URL

Generates the pre-signed URL to a staged file using the stage name and relative file path as inputs. Access files in an external stage using the function.

BUILD_SCOPED_FILE_URL

Generates a scoped Snowflake-hosted URL to a staged file using the stage name and relative file path as inputs.

BUILD_STAGE_FILE_URL

Generates a Snowflake-hosted file URL to a staged file using the stage name and relative file path as inputs.

Metadata Columns

The following set of metadata columns for a stage store similar information as returned by the functions described in SQL Functions:

METADATA$RELATIVE_PATH

Stores values similar to the GET_RELATIVE_PATH function output. Displays the path of a staged file relative to its location in the stage.

METADATA$ABSOLUTE_PATH

Stores values similar to the GET_ABSOLUTE_PATH function output. Displays the absolute path of a staged file using the stage name and path of the file relative to its location in the stage as inputs.

METADATA$PRESIGNED_URL

Stores values similar to the GET_PRESIGNED_URL function output. Displays the pre-signed URL to a staged file with the default parameters set.

Examples

Querying METADATA$ABSOLUTE_PATH Column Values

Use the API for your cloud storage service to generate a list of JSON documents that contain the metadata extracted from the images.

For example, suppose the JSON document for one bitmap image file is as follows:

{
  "file_url": "s3://photos/national_parks/us/yosemite/half_dome.jpg",
  "image_format": "jpeg",
  "dimensions": {"x" : 1024, "y" : 768},
  "tags":[
    "rock",
    "cliff",
    "valley"
  ],
  "dominant_color": "gray"
}

The following SQL statement returns the METADATA$ABSOLUTE_PATH column values for the staged image file and JSON document:

SELECT METADATA$ABSOLUTE_PATH FROM @images_stage;

+---------------------------------------------------------------+
| METADATA$ABSOLUTE_PATH                                        |
|---------------------------------------------------------------|
| s3://photos/national_parks/us/yosemite/half_dome.jpg          |
..
| s3://photosimage_metadata.json                                |
+---------------------------------------------------------------+

Downloading Staged Files in Snowsight

Users can click on a generated scoped, pre-signed, or file URL in the results table of a Snowsight worksheet and download the referenced file.

  1. Log into the new web interface.

  2. Click Worksheets » My Worksheets (or open a local worksheet under Recent or in Folders) » <worksheet_name>.

  3. Return a scoped, pre-signed, or file URL in a query using any one of the supported methods.

  4. Click on the URL in the results table. Snowsight downloads the file referenced by the URL.

    Download pre-signed URL in Snowsight results table