Configure an Apache Iceberg™ REST catalog integration with outbound private connectivity¶
This topic explains how to configure a catalog integration for Apache Iceberg™ tables managed in a remote catalog that complies with the open source Apache Iceberg™ REST OpenAPI specification.
With this configuration, you can use the catalog integration to connect to a remote Iceberg REST catalog through a private IP address instead of over the public internet.
The following diagram shows how an Iceberg table uses a catalog integration with an external Iceberg catalog.
For general information about outbound private connectivity in Snowflake, including outbound private connectivity costs, see Private connectivity for outbound network traffic.
This topic covers the configuration steps for the following catalog types:
Generic Iceberg REST catalogs
AWS Glue Data Catalog
Databricks Unity Catalog on AWS
Note
Private connectivity is only supported for catalog integrations on AWS that use AWS PrivateLink and Azure that use Azure Private Link.
Private connectivity is only available within the same cloud provider; the catalog and the Snowflake deployment must be running in the same cloud provider.
Step 1: Gather private connectivity information for your catalog¶
You must gather private connectivity information to specify it later when you provision a corresponding private connectivity endpoint in the Snowflake VPC or VNet. When you provision a corresponding private connectivity endpoint, you create an AWS PrivateLink endpoint in Snowflake when your Snowflake account is hosted in AWS or you create an Azure private endpoint when your Snowflake account is hosted on Azure.
To gather private connectivity information for your catalog, see the documentation for the remote REST Iceberg catalog.
The following example is an AWS VPC Endpoint Service ID in AWS:
com.amazonaws.vpce.us-west-2.vpce-svc-0123456789abcdef.
You must find the provider service name and host name for your AWS Glue Data Catalog:
To obtain your provider service name (
<provider_service_name>), copycom.amazonaws.<region>.glueinto your text editor where<region>is the AWS region where your Iceberg tables are stored.An example of a provider service name is
com.amazonaws.us-west-2.glue. For more information, see Creating an interface VPC endpoint for AWS Glue in the AWS documentation.To obtain your host name (
<host_name>), copycom.amazonaws.<region>.glueinto your text editor where<region>is the AWS region where your Iceberg tables are stored.An example of a host name is
glue.us-west-2.amazonaws.com. For more information, see Connecting to the Data Catalog using AWS Glue Iceberg REST endpoint in the AWS documentation.
Note
Alternatively, to retrieve these values, you can use the describe-vpc-endpoint-services subcommand from the AWS command line. For more information, see Provision private connectivity endpoints.
Hosted on AWS
You must find the PrivateLink VPC endpoint service ID for your Databricks Unity Catalog and your Databricks workspace URL:
To find your PrivateLink VPC endpoint service ID (
vpc_endpoint_service_id), see PrivateLink VPC endpoint services in the Databricks documentation.This topic contains the list of the VPC endpoint endpoint service IDs for each AWS region. Copy the endpoint service ID for the region where your tables are hosted, which is the value for Workspace (including REST API), into a text editor.
An example of a VPC endpoint service ID is
com.amazonaws.vpce.us-west-2.vpce-svc-0129f463fcfbc46c5.For more information about PrivateLink at Databricks, see Enable private connectivity using AWS PrivateLink in the Databricks documentation.
To find your Databricks workspace URL (
databricks_workspace_URL), see Get identifiers for workspace objects in the Databricks documentation.This topic includes an example Databricks workspace URL. Copy your Databricks workspace URL into a text editor.
Step 2: Provision a private connectivity endpoint¶
In this step, you provision a private connectivity endpoint in the Snowflake VPC or VNet to enable Snowflake to connect to the remote Iceberg REST catalog by using private connectivity.
To provision a private connectivity endpoint, call the SYSTEM$PROVISION_PRIVATELINK_ENDPOINT system function.
For instructions on specifying the arguments for this system function, see the documentation for the remote REST Iceberg catalog that you want to connect to through private connectivity.
The following code block shows an example of provisioning an AWS PrivateLink endpoint:
SELECT SYSTEM$PROVISION_PRIVATELINK_ENDPOINT( 'com.amazonaws.vpce.us-west-2.vpce-svc-0123456789abcdef', 'my.catalog.com' );
To provision a private connectivity endpoint, call the SYSTEM$PROVISION_PRIVATELINK_ENDPOINT system function:
USE ROLE ACCOUNTADMIN; SELECT SYSTEM$PROVISION_PRIVATELINK_ENDPOINT( '<provider_service_name>', '<host_name>' );
Where:
<provider_service_name>is the provider service name that you copied when you gathered private connectivity information for your catalog.<host_name>is the host name that you copied when you gathered private connectivity information for your catalog.
For example:
SELECT SYSTEM$PROVISION_PRIVATELINK_ENDPOINT( 'com.amazonaws.<region>.glue', 'glue.<region>.amazonaws.com' );
Note
You only need to provision one private connectivity endpoint in the Snowflake VPC. This is because, with AWS Glue, you can use one Glue private connectivity endpoint to access everything managed by the AWS Glue Data Catalog in the same region. For more information, see Creating an interface VPC endpoint for AWS Glue in the AWS documentation.
You only need to provision one private connectivity endpoint. Unity requires just one private connectivity endpoint to access everything managed by the Unity Data Catalog in the same region.
To provision a private connectivity endpoint, call the SYSTEM$PROVISION_PRIVATELINK_ENDPOINT system function:
USE ROLE ACCOUNTADMIN; SELECT SYSTEM$PROVISION_PRIVATELINK_ENDPOINT( '<vpc_endpoint_service_id>', '<databricks_workspace_URL>' );
Where:
<vpc_endpoint_service_id>is the PrivateLink VPC endpoint service ID that you copied when you gathered private connectivity information for your catalog<databricks_workspace_URL>is the Databricks workspace URL that you copied when you gathered private connectivity information for your catalogNote
If you have multiple Databricks workspaces in the same AWS region, you can use a wildcard with your Databricks workspace URL.
Step 3: Verify the endpoint status¶
In this step, you verify the endpoint status of the private connectivity endpoint in the Snowflake VPC or VNet that you provisioned in the previous step.
To verify the endpoint status, call the SYSTEM$GET_PRIVATELINK_ENDPOINTS_INFO system function:
SELECT SYSTEM$GET_PRIVATELINK_ENDPOINTS_INFO();
The endpoint is ready to use when the
statuschanges frompendingtoavailable.
Step 4: Additional catalog-specific configuration¶
Complete the additional configuration steps for your catalog type.
Note
For some catalogs or some types of private connectivity endpoints, you also need to approve the connection or allowlist the private connectivity endpoints on the catalog server side.
To complete the additional configuration steps, see the documentation for the remote REST Iceberg catalog, and then proceed to the next step.
No additional configuration is required. Proceed to the next step.
In this step, you register the Snowflake endpoint in Databricks to accept the traffic coming from the VPC endpoint.
Complete configuration steps in Databricks
Before you register the Snowflake VPC endpoint, ensure that you complete the following configurations in Databricks:
Your workspace must be located in a customer-managed VPC.
Your Databricks account must be in the enterprise subscription.
You must set up a private access configuration.
For more information, see Configure Front-end PrivateLink in the Databricks documentation.
Register the Snowflake VPC endpoint
To register the VPC endpoint, complete the following steps:
In Snowflake, call the SYSTEM$GET_PRIVATELINK_ENDPOINTS_INFO system function, and then copy the value for
snowflake_endpoint_namein the response:SELECT SYSTEM$GET_PRIVATELINK_ENDPOINTS_INFO();
For example, the output to copy looks like
vpce-11111aaaa11aaaa11. This value is the VPC endpoint ID in your Snowflake account.In Databricks, register the Snowflake VPC endpoint ID by specifying the VPC endpoint ID value that you copied in the previous step.
For instructions, see Manage VPC endpoint registrations in the Databricks documentation.
In Databricks, add a private access setting, and then specify the VPC endpoint that you registered in the previous step.
For instructions, see Manage private access settings in the Databricks documentation.
Step 5: Create a catalog integration¶
In this step, to enable private connectivity, you configure a catalog integration for the catalog REST endpoint.
To configure this catalog integration, run the CREATE CATALOG INTEGRATION command.
For example:
CREATE OR REPLACE CATALOG INTEGRATION iceberg_rest_catalog_cat_int_private CATALOG_SOURCE = ICEBERG_REST TABLE_FORMAT = ICEBERG REST_CONFIG = ( CATALOG_URI = '<rest_api_endpoint_url>' CATALOG_API_TYPE = PRIVATE CATALOG_NAME = '<catalog_name>' ) REST_AUTHENTICATION = ( TYPE = OAUTH OAUTH_TOKEN_URI = '<token_server_uri>' OAUTH_CLIENT_ID = '<oauth_client_id>' OAUTH_CLIENT_SECRET = '<oauth_client_secret>' OAUTH_ALLOWED_SCOPES = ('all-apis', 'sql') ) ENABLED = true;
Important
To use outbound private connectivity, you must specify
CATALOG_API_TYPE=PRIVATEwhen you create the integration.For more information, including the supported authentication methods, see CREATE CATALOG INTEGRATION (Apache Iceberg™ REST).
To configure this catalog integration, follow the steps in Configure a catalog integration for AWS Glue Iceberg REST.
Important
To use outbound private connectivity, you must specify
CATALOG_API_TYPE = AWS_PRIVATE_GLUEwhen you create the integration instead ofCATALOG_API_TYPE = AWS_GLUE.For example:
CREATE CATALOG INTEGRATION glue_rest_catalog_int CATALOG_SOURCE = ICEBERG_REST TABLE_FORMAT = ICEBERG REST_CONFIG = ( CATALOG_URI = 'https://glue.us-west-2.amazonaws.com/iceberg' CATALOG_API_TYPE = AWS_PRIVATE_GLUE CATALOG_NAME = '123456789012' ) REST_AUTHENTICATION = ( TYPE = SIGV4 SIGV4_IAM_ROLE = 'arn:aws:iam::123456789012:role/my-role' SIGV4_SIGNING_REGION = 'us-west-2' ) ENABLED = TRUE;
To create a REST catalog integration to connect to Databricks Unity Catalog, use the CREATE CATALOG INTEGRATION (Apache Iceberg™ REST) command.
Important
To use outbound private connectivity, you must specify
CATALOG_API_TYPE = PRIVATEas one of theREST_CONFIGparameters when you create the integration.For
CATALOG_URIandOAUTH_TOKEN_URI, you must use the standard public hostname, which is your Databricks workspace URL, not the name of the private endpoint. Snowflake automatically routes traffic through the provisioned private endpoint whenCATALOG_API_TYPEis set toPRIVATE. To find your Databricks workspace URL, see Get identifiers for workspace objects in the Databricks documentation.
Example: Bearer token authentication
CREATE OR REPLACE CATALOG INTEGRATION unity_catalog_int_private_pat CATALOG_SOURCE = ICEBERG_REST TABLE_FORMAT = ICEBERG REST_CONFIG = ( CATALOG_URI = 'https://my-workspace.azuredatabricks.net/api/2.1/unity-catalog/iceberg-rest' CATALOG_NAME = '<catalog_name>' CATALOG_API_TYPE = PRIVATE ) REST_AUTHENTICATION = ( TYPE = BEARER BEARER_TOKEN = 'eyAbCD...eyDeF...' ) ENABLED = TRUE;
Example: OAuth authentication with service principal
The following example uses OAuth authentication with a Databricks service principal. You must have a service principal configured in Databricks with the necessary credentials (
client_idandclient_secret). For instructions on adding a service principal, see Add service principals to your account in the Databricks documentation.USE ROLE ACCOUNTADMIN; -- Set your Databricks workspace URL SET workspace_url = 'https://my-workspace.azuredatabricks.net'; -- Construct the URIs from the workspace URL SET catalog_uri = $workspace_url || '/api/2.1/unity-catalog/iceberg-rest'; SET token_uri = $workspace_url || '/oidc/v1/token'; -- Set your service principal credentials SET client_id = 'YOUR_CLIENT_ID'; SET client_secret = 'YOUR_CLIENT_SECRET'; -- Set your catalog details SET catalog_name = 'YOUR_CATALOG_NAME'; CREATE OR REPLACE CATALOG INTEGRATION unity_catalog_int_private_oauth CATALOG_SOURCE = ICEBERG_REST TABLE_FORMAT = ICEBERG REST_CONFIG = ( CATALOG_API_TYPE = PRIVATE CATALOG_URI = $catalog_uri CATALOG_NAME = $catalog_name ) REST_AUTHENTICATION = ( TYPE = OAUTH OAUTH_TOKEN_URI = $token_uri OAUTH_CLIENT_ID = $client_id OAUTH_CLIENT_SECRET = $client_secret OAUTH_ALLOWED_SCOPES = ('all-apis', 'sql') ) ENABLED = TRUE;
Step 6: Verify your catalog integration¶
To verify your catalog integration configuration, call the SYSTEM$VERIFY_CATALOG_INTEGRATION function.
For more information, see Use SYSTEM$VERIFY_CATALOG_INTEGRATION to check your catalog integration configuration.
(Optional) Step 7: Update your catalog configuration¶
We recommend that you update the configuration for your remote catalog so that it’s only accessible through private connectivity.
To update your catalog configuration, see the documentation for the remote catalog that you want to connect to through private connectivity.
AWS Glue Data Catalog doesn’t support restricting access to only allowlisted VPC endpoints.
To update your catalog configuration, see the Configure Front-end PrivateLink in the Databricks documentation.
Next steps¶
This section contains some tasks that you can perform after you configure your catalog integration:
Monitor your private connectivity endpoints¶
To monitor your private connectivity endpoints, see OUTBOUND_PRIVATELINK_ENDPOINTS view in the ACCOUNT_USAGE schema.
To explore the cost of your private connectivity endpoints, see Outbound private connectivity costs.
Configure an external volume with outbound private connectivity¶
To enable private connectivity between Snowflake and your storage buckets, configure an external volume with outbound private connectivity.
For more information about external volumes, see Configure an external volume.
Note
Catalog-vended credentials aren’t supported when you configure a catalog integration with outbound private connectivity.
To configure an external volume with outbound private connectivity, follow the instructions for your cloud provider:
Create a catalog-linked database¶
To create a Snowflake database that is connected to your external Iceberg REST catalog, create a catalog-linked database.
For more information, see Create a catalog-linked database.
Note
When you create the catalog-linked database, specify a catalog integration that is configured with outbound private connectivity.
Write to your remote catalog¶
After you configure a catalog integration for Apache Iceberg™ REST and create a catalog-linked database, you can write to your remote catalog.
To write to your remote catalog, see Write to your remote catalog.