Working With Amazon S3-compatible Storage¶
This topic provides information to help you access Amazon S3-compatible storage from Snowflake.
A storage application or device is Amazon S3-compatible if it provides an application programming interface (API) that is compliant with the industry-standard Amazon Simple Storage Service (S3) REST API. The Amazon S3 REST API enables CRUD operations and administrative actions on storage buckets and objects.
With Snowflake, you can use an external stage to connect to a growing number of S3-compatible storage solutions, including on-premises storage and devices that exist outside of the public cloud. The external stage stores an S3-compliant API endpoint, bucket name and path, and credentials. To allow users to load and unload data from and to your storage locations, you grant privileges on the stage to roles.
You can use Snowflake support for Amazon S3-compatible storage to perform tasks such as:
Querying data from an external stage without loading the data into Snowflake. For more information, see Extending Your Data Lake Using External Tables.
Reading and processing unstructured data. To learn more, see Processing Unstructured Data.
Cloud Platform Support¶
This feature is available to Snowflake accounts hosted on the following supported cloud platforms:
Amazon Web Services
Requirements for S3-compatible Storage¶
An S3-compatible API endpoint for Snowflake must meet the following requirements:
Highly compliant with the S3 API and able to pass our public test suite (in GitHub). If the endpoint does not behave like S3, it cannot work with Snowflake.
Supported by your third-party storage provider as a Snowflake S3-compatible tested and compliant service. For a list of vendors that have tested at least some of their products and found them to work with Snowflake, see Vendor Support For S3-compatible Storage.
Accessible from the public cloud where your Snowflake account is hosted.
Highly available and performant to serve analytics needs.
Configured to use virtual-hosted-style requests. For more information, see Virtual hosting of buckets in the Amazon S3 documentation.
Configured to use HTTPS communication with a valid TLS certificate.
Configured to use direct credentials.
Does not contain a port number. For example, the endpoint mystorage.com:3000 is not supported.
Amazon S3-compatible endpoints are not automatically enabled for all accounts. To request, contact the Snowflake account team or Snowflake Support. Make sure that you have verified the endpoints by using our public test suite (in GitHub) before sending the request.
Provide the following information with your request:
Your Snowflake account name and cloud region deployment.
The endpoint URL (i.e. my-s3-endpoint.company.com).
The software or hardware vendor that provides the endpoint.
Creating an External Stage for S3-compatible Storage¶
To create an external stage for S3-compatible storage, create a named external stage using the CREATE STAGE command. A named stage object can be used to list files, load data, and unload files, among other SQL actions.
Optionally, add a directory table the the external stage. You can query a directory table to retrieve file URLs to access files in the referenced storage, as well as other metadata.
When you add a directory table, you must set the AUTO_REFRESH parameter to FALSE. The metadata for S3-compatible external stages cannot be refreshed automatically.
The following example creates an external stage named
my_s3_compat_stage that points to the bucket and path named
my_bucket/files/ at the endpoint
mystorage.com. The AWS_KEY_ID and AWS_SECRET_KEY values used in this example are for illustration purposes only.
CREATE STAGE my_s3compat_stage URL = 's3compat://my_bucket/files/' ENDPOINT = 'mystorage.com' CREDENTIALS = (AWS_KEY_ID = '1a2b3c...' AWS_SECRET_KEY = '4x5y6z...')
Loading and Unloading Data¶
You can load and unload data using an external stage configured for S3-compatible storage. The following features work with S3-compatible storage:
Bulk data loading using the COPY INTO <table> command.
For example, load data into table
t1from all files located in the
loadsubpath in the bucket and path defined in a stage named
COPY INTO t1 FROM @my_s3compat_stage/load/;
For sample programs, see Option 1: Loading Data Using the Snowpipe REST API.
Data unloading using the COPY INTO <location> command.
For example, unload data from table
t2into files in the
unloadsubpath in the bucket and path defined in a stage named
COPY INTO @my_s3compat_stage/unload/ FROM t2;
Extending Your Data Lake Using External Tables¶
You can use external tables with S3-compatible storage to query data without first loading it into Snowflake. This section briefly covers how to create and query an external table that references a location on an external stage configured for S3-compatible storage.
Start by creating an external table using CREATE EXTERNAL TABLE that reference an S3-compatible external stage.
The metadata for these external tables cannot be refreshed automatically. The
AUTO_REFRESH = TRUE parameter setting is not supported. You must manually refresh the metadata by executing an ALTER EXTERNAL TABLE … REFRESH command to register any added or removed files.
The following example creates an external table named
et that references subpath
path1 in a stage named
my_s3compat_stage. The files in the
path1 subpath are in the Apache Parquet format.
CREATE EXTERNAL TABLE et LOCATION=@my_s3compat_stage/path1/ AUTO_REFRESH = FALSE REFRESH_ON_CREATE = TRUE FILE_FORMAT = (TYPE = PARQUET);
After you create an external table for S3-compatible storage, you can query it. For example, query the
value column in the external table created previously:
SELECT value FROM et;
Query performance varies depending on network and application or device performance. If performance is critical, we recommend creating a materialized view on the external table.
Vendor Support For S3-compatible Storage¶
You can use devices or applications that have an S3-compliant API with Snowflake. However, your storage service provider is responsible for ensuring compliance.
The following vendors have indicated to Snowflake that they have tested at least some of their products and found them to work with Snowflake:
Hitachi Content Platform
This list is provided for convenience only. Snowflake does not test external products to validate compatibility and cannot fix issues in products sold by third-party vendors. If you have questions about whether or how your hardware or software with an S3 API works with Snowflake, we recommend that you contact the vendor directly.
Testing Your S3-compatible API¶
If you are a hardware or software developer who has created an S3-compatible API, you can use our public test suite (in GitHub) to test whether your S3 API works with Snowflake. The test suite looks for obvious mismatches between your implementation and what Snowflake expects from S3. However, there may be cases where the tests do not identify incompatibility.
If you are a customer and you want to test your own devices, we recommend you contact your vendor to run these tests. You can also run these public tests on your devices to assess compatibility.