Configure a catalog integration for AWS Glue

This topic covers how to create a catalog integration for AWS Glue and grant Snowflake restricted access to the AWS Glue Data Catalog.

Note

To complete the instructions in this section, you must have permissions in Amazon Web Services (AWS) to create and manage IAM policies and roles. If you are not an AWS administrator, ask your AWS administrator to perform these tasks.

Step 1: Configure access permissions for the AWS Glue Data Catalog

As a best practice, create a new IAM policy for Snowflake to access the AWS Glue Data Catalog. You can then attach the policy to an IAM role and use the security credentials that AWS generates for that role to access files in the catalog. For instructions, see Creating IAM policies and Modifying a role permissions policy in the AWS Identity and Access Management User Guide.

At a minimum, Snowflake requires the following permissions on the AWS Glue Data Catalog to access information about tables.

  • glue:GetTable

  • glue:GetTables

The following example policy (in JSON format) provides the required permissions to access all of the tables in a specified database.

{
   "Version": "2012-10-17",
   "Statement": [
      {
         "Sid": "AllowGlueCatalogTableAccess",
         "Effect": "Allow",
         "Action": [
            "glue:GetTable",
            "glue:GetTables"
         ],
         "Resource": [
            "arn:aws:glue:*:<accountid>:table/*/*",
            "arn:aws:glue:*:<accountid>:catalog",
            "arn:aws:glue:*:<accountid>:database/<database-name>"
         ]
      }
   ]
}
Copy

Note

  • You can modify the Resource element of this policy to further restrict the allowed resources (for example, catalog, databases, or tables). For more information, see Resource types defined by AWS Glue.

  • If you use encryption for AWS Glue, you must modify the policy to add AWS Key Management Service (AWS KMS) permissions. For more information, see Setting up encryption in AWS Glue.

Step 2: Create a catalog integration in Snowflake

Create a catalog integration for the AWS Glue Data Catalog using the CREATE CATALOG INTEGRATION (AWS Glue) command.

The following example creates a catalog integration that uses an AWS Glue Data Catalog source. The example specifies a value for the optional GLUE_REGION parameter.

CREATE CATALOG INTEGRATION glueCatalogInt
  CATALOG_SOURCE = GLUE
  CATALOG_NAMESPACE = 'my.catalogdb'
  TABLE_FORMAT = ICEBERG
  GLUE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/myGlueRole'
  GLUE_CATALOG_ID = '123456789012'
  GLUE_REGION = 'us-east-2'
  ENABLED = TRUE;
Copy

Step 3: Retrieve the AWS IAM user and external ID for your Snowflake account

To retrieve information about the AWS IAM user and the external ID that were created for your Snowflake account when you created the catalog integration, execute the DESCRIBE CATALOG INTEGRATION command. You provide this information to AWS in the next section to establish a trust relationship.

The following example command describes the catalog integration created in the previous step:

DESCRIBE CATALOG INTEGRATION glueCatalogInt;
Copy

Record the following values:

Value

Description

GLUE_AWS_IAM_USER_ARN

The AWS IAM user created for your Snowflake account, for example, arn:aws:iam::123456789001:user/abc1-b-self1234. Snowflake provisions a single IAM user for your entire Snowflake account. All Glue catalog integrations in your account use that IAM user.

GLUE_AWS_EXTERNAL_ID

The external ID that is needed to establish a trust relationship.

You will provide these values in the next section.

Step 4: Grant the IAM user permissions to access the AWS Glue Data Catalog

Update the trust policy for the same IAM role that you specified with the ARN when you created the catalog integration (GLUE_AWS_ROLE_ARN). Add the values that you recorded in Step 3: Retrieve the AWS IAM user and external ID for your Snowflake account to the trust policy.

For instructions, see Modifying a trust policy.

The following example trust policy demonstrates where to specify the GLUE_AWS_IAM_USER_ARN and GLUE_AWS_EXTERNAL_ID values:

{
   "Version": "2012-10-17",
   "Statement": [
      {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
         "AWS": "<glue_iam_user_arn>"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
         "StringEquals": {
            "sts:ExternalId": "<glue_aws_external_id>"
         }
      }
      }
   ]
}
Copy

Where:

  • glue_iam_user_arn is the GLUE_IAM_USER_ARN value that you recorded.

  • glue_aws_external_id is the GLUE_AWS_EXTERNAL_ID value that you recorded.

Note

  • For security reasons, if you create a new catalog integration (or recreate an existing catalog integration using the CREATE OR REPLACE CATALOG INTEGRATION syntax), the new catalog integration has a different external ID and cannot resolve the trust relationship unless you modify the trust policy with the new external ID.

  • To verify that your permissions are configured correctly, create an Iceberg table using this catalog integration. Snowflake doesn’t verify that your permissions are set correctly until you create an Iceberg table that references this catalog integration.

Next steps

After you configure a catalog integration for AWS Glue, you can create an Iceberg table that uses AWS Glue as the catalog.

To update the table and keep it in sync with changes in AWS Glue, use an ALTER ICEBERG TABLE … REFRESH statement. For more information, see Refresh the metadata for an Iceberg table.