Option 1: Configuring a Snowflake Storage Integration to Access Amazon S3¶
This topic describes how to use storage integrations to allow Snowflake to read data from and write data to an Amazon S3 bucket referenced in an external (i.e. S3) stage. Integrations are named, first-class Snowflake objects that avoid the need for passing explicit cloud provider credentials such as secret keys or access tokens. Integration objects store an AWS identity and access management (IAM) user ID. An administrator in your organization grants the integration IAM user permissions in the AWS account.
An integration can also list buckets (and optional paths) that limit the locations users can specify when creating external stages that use the integration.
Note
Completing the instructions in this section requires permissions in AWS to create and manage IAM policies and roles. If you are not an AWS administrator, ask your AWS administrator to perform these tasks.
Note that currently, accessing S3 storage in government regions using a storage integration is limited to Snowflake accounts hosted on AWS in the same government region. Accessing your S3 storage from an account hosted outside of the government region using direct credentials is supported.
The following diagram shows the integration flow for a S3 stage:
An external (i.e. S3) stage references a storage integration object in its definition.
Snowflake automatically associates the storage integration with a S3 IAM user created for your account. Snowflake creates a single IAM user that is referenced by all S3 storage integrations in your Snowflake account.
An AWS administrator in your organization grants permissions to the IAM user to access the bucket referenced in the stage definition. Note that many external stage objects can reference different buckets and paths and use the same storage integration for authentication.
When a user loads or unloads data from or to a stage, Snowflake verifies the permissions granted to the IAM user on the bucket before allowing or denying access.
In this Topic:
Configuring Secure Access to Cloud Storage¶
This section describes how to configure a Snowflake storage integration object to delegate authentication responsibility for cloud storage to a Snowflake identity and access management (IAM) entity.
Step 1: Configure Access Permissions for the S3 Bucket¶
AWS Access Control Requirements¶
Snowflake requires the following permissions on an S3 bucket and folder to be able to access files in the folder (and sub-folders):
s3:GetBucketLocation
s3:GetObject
s3:GetObjectVersion
s3:ListBucket
Note
The following additional permissions are required to perform additional SQL actions:
Permission |
SQL Action |
---|---|
|
Unload files to the bucket. |
|
Either automatically purge files from the stage after a successful load or execute REMOVE statements to manually remove files. |
As a best practice, Snowflake recommends creating an IAM policy for Snowflake access to the S3 bucket. You can then attach the policy to the role and use the security credentials generated by AWS for the role to access files in the bucket.
Creating an IAM Policy¶
The following step-by-step instructions describe how to configure access permissions for Snowflake in your AWS Management Console so that you can use an S3 bucket to load and unload data:
Log into the AWS Management Console.
From the home dashboard, choose Identity & Access Management (IAM):
Choose Account settings from the left-hand navigation pane.
Expand the Security Token Service Regions list, find the AWS region corresponding to the region where your account is located, and choose Activate if the status is Inactive.
Choose Policies from the left-hand navigation pane.
Click Create Policy:
Click the JSON tab.
Add a policy document that will allow Snowflake to access the S3 bucket and folder.
The following policy (in JSON format) provides Snowflake with the required permissions to load or unload data using a single bucket and folder path. You can also purge data files using the PURGE copy option.
Copy and paste the text into the policy editor:
Note
Make sure to replace
bucket
andprefix
with your actual bucket name and folder path prefix.The Amazon Resource Names (ARN) for buckets in government regions have a
arn:aws-us-gov:s3:::
prefix.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:GetObjectVersion", "s3:DeleteObject", "s3:DeleteObjectVersion" ], "Resource": "arn:aws:s3:::<bucket>/<prefix>/*" }, { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": "arn:aws:s3:::<bucket>", "Condition": { "StringLike": { "s3:prefix": [ "<prefix>/*" ] } } } ] }
Note
Setting the
"s3:prefix":
condition to either["*"]
or["<path>/*"]
grants access to all prefixes in the specified bucket or path in the bucket, respectively.Note that AWS policies support a variety of different security use cases.
The following policy provides Snowflake with the required permissions to load data from a single read-only bucket and folder path. The policy includes the
s3:GetBucketLocation
,s3:GetObject
,s3:GetObjectVersion
, ands3:ListBucket
permissions:Alternative policy: Load from a read-only S3 bucket
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:GetObjectVersion" ], "Resource": "arn:aws:s3:::<bucket>/<prefix>/*" }, { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": "arn:aws:s3:::<bucket>", "Condition": { "StringLike": { "s3:prefix": [ "<prefix>/*" ] } } } ] }
Click Review policy.
Enter the policy name (e.g.
snowflake_access
) and an optional description. Click Create policy.
Step 2: Create the IAM Role in AWS¶
In the AWS Management Console, create an AWS IAM role to grant privileges on the S3 bucket containing your data files.
Log into the AWS Management Console.
From the home dashboard, choose Identity & Access Management (IAM):
Choose Roles from the left-hand navigation pane.
Click the Create role button.
Select Another AWS account as the trusted entity type.
In the Account ID field, enter your own AWS account ID temporarily. Later, you will modify the trusted relationship and grant access to Snowflake.
Select the Require external ID option. Enter a dummy ID such as
0000
. Later, you will modify the trusted relationship and specify the external ID for your Snowflake stage. An external ID is required to grant access to your AWS resources (i.e. S3) to a third party (i.e. Snowflake).Click the Next button.
Locate the policy you created in Step 1: Configure Access Permissions for the S3 Bucket (in this topic), and select this policy.
Click the Next button.
Enter a name and description for the role, and click the Create role button.
You have now created an IAM policy for a bucket, created an IAM role, and attached the policy to the role.
Record the Role ARN value located on the role summary page. In the next step, you will create a Snowflake integration that references this role.
Note
Snowflake caches the temporary credentials for a period that cannot exceed the 60 minute expiration time. If you revoke access from Snowflake, users might be able to list files and access data from the cloud storage location until the cache expires.
Step 3: Create a Cloud Storage Integration in Snowflake¶
Create a storage integration using the CREATE STORAGE INTEGRATION command. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) user for your S3 cloud storage, along with an optional set of allowed or blocked storage locations (i.e. buckets). Cloud provider administrators in your organization grant permissions on the storage locations to the generated user. This option allows users to avoid supplying credentials when creating stages or loading data.
A single storage integration can support multiple external (i.e. S3) stages. The URL in the stage definition must align with the S3 buckets (and optional paths) specified for the STORAGE_ALLOWED_LOCATIONS parameter.
Note
Only account administrators (users with the ACCOUNTADMIN role) or a role with the global CREATE INTEGRATION privilege can execute this SQL command.
CREATE STORAGE INTEGRATION <integration_name>
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = 'S3'
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = '<iam_role>'
STORAGE_ALLOWED_LOCATIONS = ('s3://<bucket>/<path>/', 's3://<bucket>/<path>/')
[ STORAGE_BLOCKED_LOCATIONS = ('s3://<bucket>/<path>/', 's3://<bucket>/<path>/') ]
Where:
integration_name
is the name of the new integration.iam_role
is the Amazon Resource Name (ARN) of the role you created in Step 2: Create the IAM Role in AWS (in this topic).bucket
is the name of a S3 bucket that stores your data files (e.g.mybucket
). The required STORAGE_ALLOWED_LOCATIONS parameter and optional STORAGE_BLOCKED_LOCATIONS parameter restrict or block access to these buckets, respectively, when stages that reference this integration are created or modified.path
is an optional path that can be used to provide granular control over objects in the bucket.
The following example creates an integration that explicitly limits external stages that use the integration to reference either of two buckets and paths. In a later step, we will create an external stage that references one of these buckets and paths.
Additional external stages that also use this integration can reference the allowed buckets and paths:
CREATE STORAGE INTEGRATION s3_int TYPE = EXTERNAL_STAGE STORAGE_PROVIDER = 'S3' ENABLED = TRUE STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::001234567890:role/myrole' STORAGE_ALLOWED_LOCATIONS = ('s3://mybucket1/mypath1/', 's3://mybucket2/mypath2/') STORAGE_BLOCKED_LOCATIONS = ('s3://mybucket1/mypath1/sensitivedata/', 's3://mybucket2/mypath2/sensitivedata/');
Step 4: Retrieve the AWS IAM User for your Snowflake Account¶
Execute the DESCRIBE INTEGRATION command to retrieve the ARN for the AWS IAM user that was created automatically for your Snowflake account:
DESC INTEGRATION <integration_name>;
Where:
integration_name
is the name of the integration you created in Step 3: Create a Cloud Storage Integration in Snowflake (in this topic).
For example:
DESC INTEGRATION s3_int; +---------------------------+---------------+--------------------------------------------------------------------------------+------------------+ | property | property_type | property_value | property_default | +---------------------------+---------------+--------------------------------------------------------------------------------+------------------| | ENABLED | Boolean | true | false | | STORAGE_ALLOWED_LOCATIONS | List | s3://mybucket1/mypath1/,s3://mybucket2/mypath2/ | [] | | STORAGE_BLOCKED_LOCATIONS | List | s3://mybucket1/mypath1/sensitivedata/,s3://mybucket2/mypath2/sensitivedata/ | [] | | STORAGE_AWS_IAM_USER_ARN | String | arn:aws:iam::123456789001:user/abc1-b-self1234 | | | STORAGE_AWS_ROLE_ARN | String | arn:aws:iam::001234567890:role/myrole | | | STORAGE_AWS_EXTERNAL_ID | String | MYACCOUNT_SFCRole=2_a123456/s0aBCDEfGHIJklmNoPq= | | +---------------------------+---------------+--------------------------------------------------------------------------------+------------------+
Record the following values:
Value
Description
STORAGE_AWS_IAM_USER_ARN
The AWS IAM user created for your Snowflake account,
arn:aws:iam::123456789001:user/abc1-b-self1234
in this example. We provision a single IAM user for your entire Snowflake account. All S3 storage integrations use that IAM user.STORAGE_AWS_EXTERNAL_ID
The external ID that is needed to establish a trust relationship.
You will provide these values in the next section.
Step 5: Grant the IAM User Permissions to Access Bucket Objects¶
The following step-by-step instructions describe how to configure IAM access permissions for Snowflake in your AWS Management Console so that you can use a S3 bucket to load and unload data:
Log into the AWS Management Console.
From the home dashboard, choose Identity & Access Management (IAM):
Choose Roles from the left-hand navigation pane.
Click on the role you created in Step 2: Create the IAM Role in AWS (in this topic).
Click on the Trust relationships tab.
Click the Edit trust relationship button.
Modify the policy document with the DESC STORAGE INTEGRATION output values you recorded in Step 4: Retrieve the AWS IAM User for your Snowflake Account (in this topic):
Policy document for IAM role
{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "<snowflake_user_arn>" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "<snowflake_external_id>" } } } ] }
Where:
snowflake_user_arn
is the STORAGE_AWS_IAM_USER_ARN value you recorded.snowflake_external_id
is the STORAGE_AWS_EXTERNAL_ID value you recorded.In this example, the
snowflake_external_id
value isMYACCOUNT_SFCRole=2_a123456/s0aBCDEfGHIJklmNoPq=
.Note
For security reasons, if you create a new storage integration (or recreate an existing storage integration using the CREATE OR REPLACE STORAGE INTEGRATION syntax), the resulting integration has a different external ID and so it cannot resolve the trust relationship unless the trust policy is modified.
Click the Update Trust Policy button. The changes are saved.
Note
Snowflake caches the temporary credentials for a period that cannot exceed the 60 minute expiration time. If you revoke access from Snowflake, users might be able to list files and load data from the cloud storage location until the cache expires.
Step 6: Create an External Stage¶
Create an external (i.e. S3) stage that references the storage integration you created in Step 3: Create a Cloud Storage Integration in Snowflake (in this topic).
Note
Creating a stage that uses a storage integration requires a role that has the CREATE STAGE privilege for the schema as well as the USAGE privilege on the storage integration. For example:
GRANT CREATE STAGE ON SCHEMA public TO ROLE myrole;
GRANT USAGE ON INTEGRATION s3_int TO ROLE myrole;
Create the stage using the CREATE STAGE command.
For example, set mydb.public
as the current database and schema for the user session, and then create a stage named my_s3_stage
. In this example, the stage references the S3 bucket and path mybucket1/path1
, which are supported by the integration. The stage also references a named file format object called my_csv_format
:
USE SCHEMA mydb.public; CREATE STAGE my_s3_stage STORAGE_INTEGRATION = s3_int URL = 's3://bucket1/path1/' FILE_FORMAT = my_csv_format;
Note
The stage owner (i.e. the role with the OWNERSHIP privilege on the stage) must have the USAGE privilege on the storage integration.
Append a forward slash (
/
) to the URL value to filter to the specified folder path. If the forward slash is omitted, all files and folders starting with the prefix for the specified path are included.Note that the forward slash is required to access and retrieve unstructured data files in the stage.
To load or unload data from or to a stage that uses an integration, a role must have the USAGE privilege on the stage. It is not necessary to also have the USAGE privilege on the storage integration.
The STORAGE_INTEGRATION parameter is handled separately from other stage parameters, such as FILE_FORMAT. Support for these other parameters is the same regardless of the integration used to access your S3 bucket.
Next: AWS Data File Encryption