Working With Amazon S3-compatible Storage¶
The industry-standard Amazon Simple Storage Service (S3) REST API enables programmatic access to storage buckets and objects to perform CRUD operations and various administrative actions. A growing number of storage applications and devices have their own S3-compatible APIs. Snowflake features that access and interact with S3 storage can also work with S3-compatible storage, with some limitations.
To access storage outside of the public cloud (i.e. on-premises or the edge), you can create external stages in Snowflake that store the S3-compliant API endpoint, bucket name and path, and credentials. Grant privileges on these stages to roles to allow users to load and unload data from and to the storage locations.
Create external tables on the external stages to extend your data lake to include this storage.
Cloud Platform Support¶
This feature is available to Snowflake accounts hosted on any supported cloud platform:
Amazon Web Services
Google Cloud
Microsoft Azure
Requirements for S3-compatible Storage¶
An S3-compatible API endpoint must be:
Highly compliant with the S3 API. If the endpoint does not behave like S3, it cannot work with Snowflake.
Supported by the third-party storage provider as a Snowflake S3-compatible tested and compliant service.
Accessible from the public cloud where your Snowflake account is hosted.
Highly available and performant to serve analytics needs.
Configured to use direct credentials.
Configured to use HTTPS communication with a valid TLS certificate.
Creating External Stages¶
Create named external stages using CREATE STAGE. A named stage object can be used to list files, load data, and unload files, among other SQL actions.
You can add a directory table on any external stage you create. Query a directory table to retrieve file URLs to access files in the referenced storage, as well as other metadata.
Syntax¶
CREATE [ OR REPLACE ] STAGE <external_stage_name>
URL = 's3compat://{bucket}[/{path}/]'
ENDPOINT = '<s3_api_compatible_endpoint>'
CREDENTIALS = ( AWS_KEY_ID = '<string>' AWS_SECRET_KEY = '<string>' )
..
Parameters¶
This section describes the parameters that are specific to external stages that point to storage locations via S3-compatible endpoints. For descriptions of common parameters, see CREATE STAGE.
URL = 's3compat://bucket[/path/]'
URL for the external location (existing bucket accessed using an S3-compatible API endpoint) used to store data files, where:
bucket
is the name of the bucket.path
is an optional case-sensitive path (or prefix in S3 terminology) for files in the cloud storage location (i.e. files with names that begin with a common string).
ENDPOINT = 's3_api_compatible_endpoint'
Fully-qualified domain that points to the S3-compatible API endpoint.
Access Control Requirements¶
A role used to execute this SQL command must have the following privileges at a minimum:
Privilege |
Object |
Notes |
---|---|---|
CREATE STAGE |
Schema |
Required if creating a new stage. |
USAGE |
File format |
Required only if referencing a named file format in the stage definition. |
Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.
For instructions on creating a custom role with a specified set of privileges, see Creating Custom Roles.
For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.
Usage Notes¶
Accessing S3-compatible storage requires direct credentials.
Examples¶
Create an external stage named my_s3compat_stage
that points to the bucket and path my_bucket/files/
at the endpoint mystorage.com
:
CREATE STAGE my_s3compat_stage
URL = 's3compat://my_bucket/files/'
ENDPOINT = 'mystorage.com'
CREDENTIALS = (AWS_KEY_ID = '1a2b3c...' AWS_SECRET_KEY = '4x5y6z...')
Note that the AWS_KEY_ID and AWS_SECRET_KEY values used in this example are for illustration purposes only.
Loading and Unloading Data¶
Load and unload data using the external stages you created in Creating External Stages (in this topic). The following features work with your external stages:
Bulk data loading using the COPY INTO <table> command.
For example, load data into table
t1
from all files located in theload
subpath in the bucket and path defined in themy_s3compat_stage
stage you created:COPY INTO t1 FROM @my_s3compat_stage/load/;
Calling Snowpipe REST endpoints to continuously load data.
For sample programs, see Option 1: Loading Data Using the Snowpipe REST API.
Data unloading using the COPY INTO <location> command.
For example, unload data from table
t2
into files in theunload
subpath in the bucket and path defined in themy_s3compat_stage
stage you created:COPY INTO @my_s3compat_stage/unload/ FROM t2;
Access Control Requirements¶
A role used to load or unload data must have the following privileges at a minimum:
Privilege |
Object |
Notes |
---|---|---|
USAGE |
Stage |
|
USAGE |
File format |
Required only if referencing a named file format in the COPY INTO <table> or COPY INTO <location> statement. |
INSERT |
Table |
Source or target table for data loading or unloading. |
Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.
Extending Your Data Lake Using External Tables¶
External tables enable querying data stored in external cloud storage for analysis without first loading it into Snowflake.
Creating External Tables¶
Create external tables using CREATE EXTERNAL TABLE that reference the external stages you created in Creating External Stages (in this topic).
Access Control Requirements¶
A role used to execute this SQL command must have the following privileges at a minimum:
Privilege |
Object |
Notes |
---|---|---|
CREATE EXTERNAL TABLE |
Schema |
|
USAGE |
Stage |
Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.
Usage Notes¶
The metadata for these external tables cannot be refreshed automatically. The
AUTO_REFRESH = TRUE
parameter setting is not supported.You must refresh the metadata for your external tables manually. For more information, see Manually Refreshing the External Table Metadata (in this topic).
Examples¶
Create an external table named et
that references subpath path1
in the my_s3compat_stage
stage you created. The format of the files in this subpath is Apache Parquet:
CREATE EXTERNAL TABLE et
LOCATION=@my_s3compat_stage/path1/
AUTO_REFRESH = FALSE
REFRESH_ON_CREATE = TRUE
FILE_FORMAT = (TYPE = PARQUET);
Manually Refreshing the External Table Metadata¶
Periodically execute an ALTER EXTERNAL TABLE … REFRESH statement to register any added or removed files.
Only the external table owner (i.e. the role with the OWNERSHIP privilege on the external table) or higher can execute this command. Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.
Querying External Tables¶
After the external table metadata has been refreshed, users can query the external table.
A role used to query an external table must have the USAGE privilege on the external table. Note that operating on any object in a schema also requires the USAGE privilege on the parent database and schema.
For example, query columns col1
and col2
in the external table you created:
SELECT col1, col2 FROM et;
Query performance varies depending on network and application or device performance.
Vendor Support For S3-compatible Storage¶
Devices or applications that have a compliant S3 API can be used with Snowflake. Ensuring compliance is the responsibility of the storage service provider.
The following vendors have indicated to Snowflake that they have tested at least some of their products and found them to work with Snowflake:
Cloudflare
Cloudian
Dell
Hitachi Content Platform
MinIO
NetApp (StorageGRID)
PureStorage
Scality
This list is provided for convenience only. Snowflake does not test external products to validate compatibility and cannot fix issues in products sold by third party vendors. If you have questions about whether or how your hardware or software with an S3 API works with Snowflake, we recommend that you contact the vendor directly.
Testing Your S3-compatible API¶
If you are a developer of hardware or software that has an S3 API, you can use our public test suite (in GitHub) to test your S3 API. This set of tests looks for obvious mismatches between your implementation and what Snowflake expects from S3. Note that there may be cases where the tests do not identify incompatibility.
If you are a customer and you want to test your own devices, we recommend you contact your vendor to run these tests; however, you can also run these public tests on your devices to assess compatibility.