Working With Amazon S3-compatible Storage

The industry-standard Amazon Simple Storage Service (S3) REST API enables programmatic access to storage buckets and objects to perform CRUD operations and various administrative actions. A growing number of storage applications and devices have their own S3-compatible APIs. Snowflake features that access and interact with S3 storage can also work with S3-compatible storage, with some limitations.

To access storage outside of the public cloud (i.e. on-premises or the edge), you can create external stages in Snowflake that store the S3-compliant API endpoint, bucket name and path, and credentials. Grant privileges on these stages to roles to allow users to load and unload data from and to the storage locations.

Create external tables on the external stages to extend your data lake to include this storage.

Cloud Platform Support

This feature is available to Snowflake accounts hosted on any supported cloud platform:

  • Amazon Web Services

  • Google Cloud

  • Microsoft Azure

Requirements for S3-compatible Storage

An S3-compatible API endpoint must be:

  • Highly compliant with the S3 API. If the endpoint does not behave like S3, it cannot work with Snowflake.

  • Supported by the third-party storage provider as a Snowflake S3-compatible tested and compliant service.

  • Accessible from the public cloud where your Snowflake account is hosted.

  • Highly available and performant to serve analytics needs.

  • Configured to use direct credentials.

  • Configured to use HTTPS communication with a valid TLS certificate.

Creating External Stages

Create named external stages using CREATE STAGE. A named stage object can be used to list files, load data, and unload files, among other SQL actions.

You can add a directory table on any external stage you create. Query a directory table to retrieve file URLs to access files in the referenced storage, as well as other metadata.

Syntax

CREATE [ OR REPLACE ] STAGE <external_stage_name>
  URL = 's3compat://{bucket}[/{path}/]'
  ENDPOINT = '<s3_api_compatible_endpoint>'
  CREDENTIALS = ( AWS_KEY_ID = '<string>' AWS_SECRET_KEY = '<string>' )
  ..
Copy

Parameters

This section describes the parameters that are specific to external stages that point to storage locations via S3-compatible endpoints. For descriptions of common parameters, see CREATE STAGE.

URL = 's3compat://bucket[/path/]'

URL for the external location (existing bucket accessed using an S3-compatible API endpoint) used to store data files, where:

  • bucket is the name of the bucket.

  • path is an optional case-sensitive path (or prefix in S3 terminology) for files in the cloud storage location (i.e. files with names that begin with a common string).

ENDPOINT = 's3_api_compatible_endpoint'

Fully-qualified domain that points to the S3-compatible API endpoint.

Access Control Requirements

A role used to execute this SQL command must have the following privileges at a minimum:

Privilege

Object

Notes

CREATE STAGE

Schema

Required if creating a new stage.

USAGE

File format

Required only if referencing a named file format in the stage definition.

Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

For instructions on creating a custom role with a specified set of privileges, see Creating Custom Roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.

Usage Notes

  • Accessing S3-compatible storage requires direct credentials.

Examples

Create an external stage named my_s3compat_stage that points to the bucket and path my_bucket/files/ at the endpoint mystorage.com:

CREATE STAGE my_s3compat_stage
  URL = 's3compat://my_bucket/files/'
  ENDPOINT = 'mystorage.com'
  CREDENTIALS = (AWS_KEY_ID = '1a2b3c...' AWS_SECRET_KEY = '4x5y6z...')
Copy

Note that the AWS_KEY_ID and AWS_SECRET_KEY values used in this example are for illustration purposes only.

Loading and Unloading Data

Load and unload data using the external stages you created in Creating External Stages (in this topic). The following features work with your external stages:

Access Control Requirements

A role used to load or unload data must have the following privileges at a minimum:

Privilege

Object

Notes

USAGE

Stage

USAGE

File format

Required only if referencing a named file format in the COPY INTO <table> or COPY INTO <location> statement.

INSERT

Table

Source or target table for data loading or unloading.

Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

Extending Your Data Lake Using External Tables

External tables enable querying data stored in external cloud storage for analysis without first loading it into Snowflake.

Creating External Tables

Create external tables using CREATE EXTERNAL TABLE that reference the external stages you created in Creating External Stages (in this topic).

Access Control Requirements

A role used to execute this SQL command must have the following privileges at a minimum:

Privilege

Object

Notes

CREATE EXTERNAL TABLE

Schema

USAGE

Stage

Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

Usage Notes

  • The metadata for these external tables cannot be refreshed automatically. The AUTO_REFRESH = TRUE parameter setting is not supported.

    You must refresh the metadata for your external tables manually. For more information, see Manually Refreshing the External Table Metadata (in this topic).

Examples

Create an external table named et that references subpath path1 in the my_s3compat_stage stage you created. The format of the files in this subpath is Apache Parquet:

CREATE EXTERNAL TABLE et
 LOCATION=@my_s3compat_stage/path1/
 AUTO_REFRESH = FALSE
 REFRESH_ON_CREATE = TRUE
 FILE_FORMAT = (TYPE = PARQUET);
Copy

Manually Refreshing the External Table Metadata

Periodically execute an ALTER EXTERNAL TABLE … REFRESH statement to register any added or removed files.

Only the external table owner (i.e. the role with the OWNERSHIP privilege on the external table) or higher can execute this command. Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

Querying External Tables

After the external table metadata has been refreshed, users can query the external table.

A role used to query an external table must have the USAGE privilege on the external table. Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.

For example, query columns col1 and col2 in the external table you created:

SELECT col1, col2 FROM et;
Copy

Query performance varies depending on network and application or device performance.

Vendor Support For S3-compatible Storage

Devices or applications that have a compliant S3 API can be used with Snowflake. Ensuring compliance is the responsibility of the storage service provider.

The following vendors have indicated to Snowflake that they have tested at least some of their products and found them to work with Snowflake:

  • Cloudflare

  • Cloudian

  • Dell

  • Hitachi Content Platform

  • MinIO

  • NetApp (StorageGRID)

  • PureStorage

  • Scality

This list is provided for convenience only. Snowflake does not test external products to validate compatibility and cannot fix issues in products sold by third party vendors. If you have questions about whether or how your hardware or software with an S3 API works with Snowflake, we recommend that you contact the vendor directly.

Testing Your S3-compatible API

If you are a developer of hardware or software that has an S3 API, you can use our public test suite (in GitHub) to test your S3 API. This set of tests looks for obvious mismatches between your implementation and what Snowflake expects from S3. Note that there may be cases where the tests do not identify incompatibility.

If you are a customer and you want to test your own devices, we recommend you contact your vendor to run these tests; however, you can also run these public tests on your devices to assess compatibility.