Configuring an Integration for Google Cloud Storage

This topic describes how to configure secure access to data files stored in a Google Cloud Storage bucket.

In this Topic:

Configuring a Snowflake Storage Integration

This section describes how to use storage integrations to allow Snowflake to read data from and write to a Google Cloud Storage bucket referenced in an external (i.e. Cloud Storage) stage. Integrations are named, first-class Snowflake objects that avoid the need for passing explicit cloud provider credentials such as secret keys or access tokens; instead, integration objects reference a Cloud Storage service account. An administrator in your organization grants the service account permissions in the Cloud Storage account.

Administrators can also restrict users to a specific set of Cloud Storage buckets (and optional paths) accessed by external stages that use the integration.

Note

Completing the instructions in this section requires access to your Cloud Storage project as a project editor. If you are not a project editor, ask your Cloud Storage administrator to perform these tasks.

The following diagram shows the integration flow for a Cloud Storage stage:

Google Cloud Storage Stage Integration Flow
  1. An external (i.e. Cloud Storage) stage references a storage integration object in its definition.

  2. Snowflake automatically associates the storage integration with a Cloud Storage service account created for your account. Snowflake creates a single service account that is referenced by all GCS storage integrations in your Snowflake account.

  3. A project editor for your Cloud Storage project grants permissions to the service account to access the bucket referenced in the stage definition. Note that many external stage objects can reference different buckets and paths and use the same integration for authentication.

When a user loads or unloads data from or to a stage, Snowflake verifies the permissions granted to the service account on the bucket before allowing or denying access.

In this Section:

Step 1: Create a Cloud Storage Integration in Snowflake

Create an integration using the CREATE STORAGE INTEGRATION command. An integration is a Snowflake object that delegates authentication responsibility for external cloud storage to a Snowflake-generated entity (i.e. a Cloud Storage service account). For accessing Cloud Storage buckets, Snowflake creates a service account that can be granted permissions to access the bucket(s) that store your data files.

A single storage integration can support multiple external (i.e. GCS) stages. The URL in the stage definition must align with the GCS buckets (and optional paths) specified for the STORAGE_ALLOWED_LOCATIONS parameter.

Note

Only account administrators (users with the ACCOUNTADMIN role) or a role with the global CREATE INTEGRATION privilege can execute this SQL command.

CREATE STORAGE INTEGRATION <integration_name>
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = GCS
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/')
  [ STORAGE_BLOCKED_LOCATIONS = ('gcs://<bucket>/<path>/', 'gcs://<bucket>/<path>/') ]

Where:

  • integration_name is the name of the new integration.

  • bucket is the name of a Cloud Storage bucket that stores your data files (e.g. mybucket). The required STORAGE_ALLOWED_LOCATIONS parameter and optional STORAGE_BLOCKED_LOCATIONS parameter restrict or block access to these buckets, respectively, when stages that reference this integration are created or modified.

  • path is an optional path that can be used to provide granular control over objects in the bucket.

The following example creates an integration that explicitly limits external stages that use the integration to reference either of two buckets and paths. In a later step, we will create an external stage that references one of these buckets and paths.

Additional external stages that also use this integration can reference the allowed buckets and paths:

CREATE STORAGE INTEGRATION gcs_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = GCS
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('gcs://mybucket1/path1/', 'gcs://mybucket2/path2/')
  STORAGE_BLOCKED_LOCATIONS = ('gcs://mybucket1/path1/sensitivedata/', 'gcs://mybucket2/path2/sensitivedata/');

Step 2: Retrieve the Cloud Storage Service Account for your Snowflake Account

Execute the DESCRIBE INTEGRATION command to retrieve the ID for the Cloud Storage service account that was created automatically for your Snowflake account:

DESC STORAGE INTEGRATION <integration_name>;

Where:

For example:

DESC STORAGE INTEGRATION gcs_int;

+-----------------------------+---------------+-----------------------------------------------------------------------------+------------------+
| property                    | property_type | property_value                                                              | property_default |
+-----------------------------+---------------+-----------------------------------------------------------------------------+------------------|
| ENABLED                     | Boolean       | true                                                                        | false            |
| STORAGE_ALLOWED_LOCATIONS   | List          | gcs://mybucket1/path1/,gcs://mybucket2/path2/                               | []               |
| STORAGE_BLOCKED_LOCATIONS   | List          | gcs://mybucket1/path1/sensitivedata/,gcs://mybucket2/path2/sensitivedata/   | []               |
| STORAGE_GCP_SERVICE_ACCOUNT | String        | service-account-id@project1-123456.iam.gserviceaccount.com                  |                  |
+-----------------------------+---------------+-----------------------------------------------------------------------------+------------------+

The STORAGE_GCP_SERVICE_ACCOUNT property in the output shows the Cloud Storage service account created for your Snowflake account (e.g. service-account-id@project1-123456.iam.gserviceaccount.com). We provision a single Cloud Storage service account for your entire Snowflake account. All Cloud Storage integrations use that service account.

Step 3: Grant the Service Account Permissions to Access Bucket Objects

The following step-by-step instructions describe how to configure IAM access permissions for Snowflake in your Google Cloud Platform Console so that you can use a Cloud Storage bucket to load and unload data:

Creating a Custom IAM Role

Create a custom role that has the permissions required to access the bucket and get objects.

  1. Log into the Google Cloud Platform Console as a project editor.

  2. From the home dashboard, choose IAM & admin » Roles.

  3. Click Create Role.

  4. Enter a name, and description for the custom role.

  5. Click Add Permissions.

  6. Filter the list of permissions, and add the following from the list:

    Data loading only
    • storage.buckets.get

    • storage.objects.get

    • storage.objects.list

    Data loading with purge option
    • storage.buckets.get

    • storage.objects.delete

    • storage.objects.get

    • storage.objects.list

    Data loading and unloading
    • storage.buckets.get

    • storage.objects.create

    • storage.objects.delete

    • storage.objects.get

    • storage.objects.list

  7. Click Create.

Assigning the Custom Role to the Cloud Storage Service Account

  1. Log into the Google Cloud Platform Console as a project editor.

  2. From the home dashboard, choose Storage » Browser:

    Bucket List in Google Cloud Platform Console
  3. Select a bucket to configure for access.

  4. Click SHOW INFO PANEL in the upper-right corner. The information panel for the bucket slides out.

  5. In the Add members field, search for the service account name from the DESCRIBE INTEGRATION output in Step 2: Retrieve the Cloud Storage Service Account for your Snowflake Account (in this topic).

    Bucket Information Panel in Google Cloud Platform Console
  6. From the Select a role dropdown, select Storage » Custom » <role>, where <role> is the custom Cloud Storage role you created in Creating a Custom IAM Role (in this topic).

  7. Click the Add button. The service account name is added to the Storage Object Viewer role dropdown in the information panel.

    Storage Object Viewer role list in Google Cloud Platform Console

Granting the Cloud Storage Service Account Permissions on the Cloud Key Management Service Cryptographic Keys

Note

This step is required only if your GCS bucket is encrypted using a key stored in the Google Cloud Key Management Service (Cloud KMS).

  1. Log into the Google Cloud Platform Console as a project editor.

  2. From the home dashboard, choose Security » Cryptographic keys.

  3. Select the key ring that is assigned to your GCS bucket.

  4. Click SHOW INFO PANEL in the upper-right corner. The information panel for the key ring slides out.

  5. In the Add members field, search for the service account name from the DESCRIBE INTEGRATION output in Step 2: Retrieve the Cloud Storage Service Account for your Snowflake Account (in this topic).

  6. From the Select a role dropdown, select the Cloud KMS CrytoKey Encryptor/Decryptor role.

  7. Click the Add button. The service account name is added to the Cloud KMS CrytoKey Encryptor/Decryptor role dropdown in the information panel.

Step 4: Create an External Stage

Create an external (i.e. Cloud Storage) stage that references the integration you created.

Note

  • Creating a stage that uses an integration requires a role that has the CREATE STAGE privilege on the schema as well as the USAGE privilege on the storage integration. For example:

    GRANT CREATE STAGE ON SCHEMA public TO ROLE myrole;
    
    GRANT USAGE ON INTEGRATION gcs_int TO ROLE myrole;
    
  • To reference a storage integration in the CREATE STAGE statement, the role must have the USAGE privilege on the storage integration object.

Create an external stage using the CREATE STAGE command.

For example, set mydb.public as the current database and schema for the user session, and then create a stage named my_gcs_stage. In this example, the stage references the Cloud Storage bucket and path mybucket1/path1, which are supported by the integration. The stage also references a named file format object called my_csv_format:

USE SCHEMA mydb.public;

CREATE STAGE my_gcs_stage
  URL = 'gcs://mybucket1/path1'
  STORAGE_INTEGRATION = gcs_int
  FILE_FORMAT = my_csv_format;

Note

  • The stage owner (i.e. the role with the OWNERSHIP privilege on the stage) must have the USAGE privilege on the storage integration.

  • To load or unload data from or to a stage that uses an integration, a role must have the USAGE privilege on the stage. It is not necessary to also have the USAGE privilege on the storage integration.

  • The STORAGE_INTEGRATION parameter is handled separately from other stage parameters, such as FILE_FORMAT. Support for these other parameters is the same regardless of the integration used to access your GCS bucket.

Modifying Existing Stages to Use Storage Integrations

To modify an existing external (i.e. Cloud Storage) stage to use a storage integration for Cloud Storage access:

  1. Create a named storage integration. See the instructions in Step 1: Create a Cloud Storage Integration in Snowflake (in this topic).

  2. Use ALTER STAGE to modify the stage. For example:

    ALTER STAGE my_gcs_stage
      SET STORAGE_INTEGRATION = gcs_int;