Configuring an integration for Google Cloud Storage¶

This topic describes how to configure secure access to data files stored in a Google Cloud Storage bucket.

Configuring a Snowflake storage integration¶

This section describes how to use storage integrations to allow Snowflake to read data from and write to a Google Cloud Storage bucket referenced in an external (i.e. Cloud Storage) stage. Integrations are named, first-class Snowflake objects that avoid the need for passing explicit cloud provider credentials such as secret keys or access tokens; instead, integration objects reference a Cloud Storage service account. An administrator in your organization grants the service account permissions in the Cloud Storage account.

Administrators can also restrict users to a specific set of Cloud Storage buckets (and optional paths) accessed by external stages that use the integration.

Note

Completing the instructions in this section requires access to your Cloud Storage project as a project editor. If you are not a project editor, ask your Cloud Storage administrator to perform these tasks.

The following diagram shows the integration flow for a Cloud Storage stage:

Google Cloud Storage Stage Integration Flow
  1. An external (i.e. Cloud Storage) stage references a storage integration object in its definition.

  2. Snowflake automatically associates the storage integration with a Cloud Storage service account created for your account. Snowflake creates a single service account that is referenced by all GCS storage integrations in your Snowflake account.

  3. A project editor for your Cloud Storage project grants permissions to the service account to access the bucket referenced in the stage definition. Note that many external stage objects can reference different buckets and paths and use the same integration for authentication.

When a user loads or unloads data from or to a stage, Snowflake verifies the permissions granted to the service account on the bucket before allowing or denying access.

In this Section:

Step 1: Create a Cloud Storage Integration in Snowflake¶

Create an integration using the CREATE STORAGE INTEGRATION command. An integration is a Snowflake object that delegates authentication responsibility for external cloud storage to a Snowflake-generated entity (i.e. a Cloud Storage service account). For accessing Cloud Storage buckets, Snowflake creates a service account that can be granted permissions to access the bucket(s) that store your data files.

A single storage integration can support multiple external (i.e. GCS) stages. The URL in the stage definition must align with the GCS buckets (and optional paths) specified for the STORAGE_ALLOWED_LOCATIONS parameter.

Note

Only account administrators (users with the ACCOUNTADMIN role) or a role with the global CREATE INTEGRATION privilege can execute this SQL command.

CREATE STORAGE INTEGRATION <integration_name>
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'GCS'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/')
  [ STORAGE_BLOCKED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') ]
Copy

Where:

  • integration_name is the name of the new integration.

  • bucket is the name of a Cloud Storage bucket that stores your data files (e.g. mybucket). The required STORAGE_ALLOWED_LOCATIONS parameter and optional STORAGE_BLOCKED_LOCATIONS parameter restrict or block access to these buckets, respectively, when stages that reference this integration are created or modified.

  • path is an optional path that can be used to provide granular control over objects in the bucket.

The following example creates an integration that explicitly limits external stages that use the integration to reference either of two buckets and paths. In a later step, we will create an external stage that references one of these buckets and paths.

Additional external stages that also use this integration can reference the allowed buckets and paths:

CREATE STORAGE INTEGRATION gcs_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'GCS'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://mybucket1/path1/', 'gcs://mybucket2/path2/')
  STORAGE_BLOCKED_LOCATIONS = ('gcs://mybucket1/path1/sensitivedata/', 'gcs://mybucket2/path2/sensitivedata/');
Copy

Step 2: Retrieve the Cloud Storage Service Account for your Snowflake Account¶

Execute the DESCRIBE INTEGRATION command to retrieve the ID for the Cloud Storage service account that was created automatically for your Snowflake account:

DESC STORAGE INTEGRATION <integration_name>;
Copy

Where:

For example:

DESC STORAGE INTEGRATION gcs_int;

+-----------------------------+---------------+-----------------------------------------------------------------------------+------------------+
| property                    | property_type | property_value                                                              | property_default |
+-----------------------------+---------------+-----------------------------------------------------------------------------+------------------|
| ENABLED                     | Boolean       | true                                                                        | false            |
| STORAGE_ALLOWED_LOCATIONS   | List          | gcs://mybucket1/path1/,gcs://mybucket2/path2/                               | []               |
| STORAGE_BLOCKED_LOCATIONS   | List          | gcs://mybucket1/path1/sensitivedata/,gcs://mybucket2/path2/sensitivedata/   | []               |
| STORAGE_GCP_SERVICE_ACCOUNT | String        | service-account-id@project1-123456.iam.gserviceaccount.com                  |                  |
+-----------------------------+---------------+-----------------------------------------------------------------------------+------------------+
Copy

The STORAGE_GCP_SERVICE_ACCOUNT property in the output shows the Cloud Storage service account created for your Snowflake account (e.g. service-account-id@project1-123456.iam.gserviceaccount.com). We provision a single Cloud Storage service account for your entire Snowflake account. All Cloud Storage integrations use that service account.

Step 3: Grant the Service Account Permissions to Access Bucket Objects¶

The following step-by-step instructions describe how to configure IAM access permissions for Snowflake in your Google Cloud Platform Console so that you can use a Cloud Storage bucket to load and unload data:

Creating a Custom IAM Role¶

Create a custom role that has the permissions required to access the bucket and get objects.

  1. Log into the Google Cloud Platform Console as a project editor.

  2. From the home dashboard, choose IAM & admin » Roles.

  3. Click Create Role.

  4. Enter a name, and description for the custom role.

  5. Click Add Permissions.

  6. Filter the list of permissions, and add the following from the list:

    Action(s)

    Required permissions

    Data loading only

    • storage.buckets.get

    • storage.objects.get

    • storage.objects.list

    Data loading with purge option, executing the REMOVE command on the stage

    • storage.buckets.get

    • storage.objects.delete

    • storage.objects.get

    • storage.objects.list

    Data loading and unloading

    • storage.buckets.get (for calculating data transfer costs)

    • storage.objects.create

    • storage.objects.delete

    • storage.objects.get

    • storage.objects.list

    Data unloading only

    • storage.buckets.get

    • storage.objects.create

    • storage.objects.delete

    • storage.objects.list

  7. Click Create.

Assigning the Custom Role to the Cloud Storage Service Account¶

  1. Log into the Google Cloud Platform Console as a project editor.

  2. From the home dashboard, choose Cloud Storage » Browser:

    Bucket List in Google Cloud Platform Console
  3. Select a bucket to configure for access.

  4. Click SHOW INFO PANEL in the upper-right corner. The information panel for the bucket slides out.

  5. Click the ADD PRINCIPAL button.

  6. In the New principals field, search for the service account name from the DESCRIBE INTEGRATION output in Step 2: Retrieve the Cloud Storage Service Account for your Snowflake Account (in this topic).

    Bucket Information Panel in Google Cloud Platform Console
  7. From the Select a role dropdown, select Custom » <role>, where <role> is the custom Cloud Storage role you created in Creating a Custom IAM Role (in this topic).

  8. Click the Save button. The service account name is added to the Storage Object Viewer role dropdown in the information panel.

    Storage Object Viewer role list in Google Cloud Platform Console

Granting the Cloud Storage Service Account Permissions on the Cloud Key Management Service Cryptographic Keys¶

Note

This step is required only if your GCS bucket is encrypted using a key stored in the Google Cloud Key Management Service (Cloud KMS).

  1. Log into the Google Cloud Platform Console as a project editor.

  2. From the home dashboard, choose Security » Cryptographic keys.

  3. Select the key ring that is assigned to your GCS bucket.

  4. Click SHOW INFO PANEL in the upper-right corner. The information panel for the key ring slides out.

  5. Click the ADD PRINCIPAL button.

  6. In the New principals field, search for the service account name from the DESCRIBE INTEGRATION output in Step 2: Retrieve the Cloud Storage Service Account for your Snowflake Account (in this topic).

  7. From the Select a role dropdown, select the Cloud KMS CrytoKey Encryptor/Decryptor role.

  8. Click the Save button. The service account name is added to the Cloud KMS CrytoKey Encryptor/Decryptor role dropdown in the information panel.

Step 4: Create an external stage¶

Create an external stage that references the integration you created.

Note

  • You must use a role that is granted or inherits the USAGE privilege on the database and schema and the CREATE STAGE privilege on the schema. The stage owner (i.e. the role with the OWNERSHIP privilege on the stage) must also have the USAGE privilege on the storage integration.

    Refer to Access control requirements for CREATE STAGE.

  • To load data to or unload data from a stage that uses an integration, a role must have the USAGE privilege on the stage. It is not necessary to also have the USAGE privilege on the storage integration.

Create an external stage using SQL¶

Ensure that the role in use is granted or inherits the necessary privileges to create a stage that uses a storage integration. For example:

GRANT USAGE ON DATABASE mydb TO ROLE myrole;
GRANT USAGE ON SCHEMA mydb.stages TO ROLE myrole;
GRANT CREATE STAGE ON SCHEMA mydb.stages TO ROLE myrole;
GRANT USAGE ON INTEGRATION gcs_int TO ROLE myrole;
Copy

You can create an external stage using the CREATE STAGE command.

For example, set mydb.stages as the current database and schema for the user session, and then create a stage named my_gcs_stage. In this example, the stage references the Cloud Storage bucket and path mybucket1/path1, which are supported by the integration. The stage also references a named file format object called my_csv_format:

USE SCHEMA mydb.stages;

CREATE STAGE my_gcs_stage
  URL = 'gcs://mybucket1/path1'
  STORAGE_INTEGRATION = gcs_int
  FILE_FORMAT = my_csv_format;
Copy

Note

  • Append a forward slash (/) to the URL value to filter to the specified folder path. If the forward slash is omitted, all files and folders starting with the prefix for the specified path are included.

    Note that the forward slash is required to access and retrieve unstructured data files on the stage.

  • The STORAGE_INTEGRATION parameter is handled separately from other stage parameters, such as FILE_FORMAT. Support for these other parameters is the same regardless of the integration used to access your GCS bucket.

Create an external stage using Snowsight¶

To use Snowsight to create a named external stage, do the following:

  1. Sign in to Snowsight.

  2. In the navigation menu, select Data » Databases.

  3. Select the database and schema where you want to create a stage.

  4. Select Create » Stage.

  5. Select Google Cloud Platform.

  6. Enter a Stage Name.

  7. Enter the URL of your Google Cloud Storage bucket.

  8. Note that Enable Directory Table is selected by default. This lets you see the files on the stage, but requires a warehouse and thus incurs a cost. You can choose to deselect this option for now and enable a directory table later.

  9. Enable Authentication.

  10. Select your storage integration from the menu.

  11. Optionally expand the SQL Preview to view a generated SQL statement. To specify additional options for your stage such as AUTO_REFRESH, you can open this SQL preview in a worksheet.

  12. Select Create.

Editing existing stages to use storage integrations¶

You can edit an existing external stage configuration to use a storage integration using SQL or the web interface.

Note

  • You cannot disable authentication or encryption settings for a stage.

  • You can update a stage to use a storage integration for authentication. However, you cannot change the authentication type to credentials if the stage already uses a storage integration. To change the authentication type, you can drop and re-create the stage.

Edit a stage using SQL¶

Use ALTER STAGE to modify the stage. For example:

ALTER STAGE my_gcs_stage
  SET STORAGE_INTEGRATION = gcs_int;
Copy

Edit a stage using Snowsight¶

  1. Sign in to Snowsight.

  2. In the navigation menu, select Data » Databases.

  3. Select the stage that you want to edit.

  4. Select More options » Edit.

  5. Make your desired changes to the stage.

  6. Select Save.