AWS VPC Interface Endpoints for Internal Stages¶
This topic provides concepts as well as detailed instructions for connecting to Snowflake internal stages through AWS VPC Interface Endpoints.
In this Topic:
AWS VPC interface endpoints and AWS PrivateLink for Amazon S3 can be combined to provide secure connectivity to Snowflake internal stages. This setup ensures that data loading and data unloading operations to Snowflake internal stages use the AWS internal network and do not take place over the public Internet.
Prior to AWS supporting VPC interface endpoints for internal stage access, it was necessary to create a proxy farm within the AWS VPC to facilitate secure access to Snowflake internal stages. With the added support of VPC interface endpoints for Snowflake internal stages, users and client applications can now access Snowflake internal stages over the private AWS network. The following diagram summarizes this new support:
Note the following regarding the numbers in the BEFORE diagram:
Users have two options to connect to a Snowflake internal stage:
Option A allows an on-premises connection directly to the internal stage as shown by the number 1.
Option B allows a connection to the internal stage through a proxy farm as shown by the numbers 2 and 3.
If using the proxy farm, users can also connect to Snowflake directly as denoted by the number 4.
Note the following regarding the numbers in the AFTER diagram:
The updates in this feature remove the need to connect to Snowflake or a Snowflake internal stage through a proxy farm.
An on-premises user can connect to Snowflake directly as shown in number 5.
To connect to a Snowflake internal stage, on-premises users connect to an interface endpoint, number 6, and then use AWS PrivateLink for Amazon S3 to connect to the Snowflake internal stage as shown in number 7.
There is a single Amazon S3 bucket per internal stage deployment. A prefix in the internal stage Amazon S3 bucket is used to organize the data in each Snowflake account. The Amazon S3 bucket endpoint URLs are different depending on whether the connection to the bucket uses private connectivity (i.e. AWS PrivateLink for S3).
- Public Amazon S3 Global Endpoint URL
- Private Amazon S3 Endpoint URL
Implementing VPC interface endpoints to access Snowflake internal stages provides the following advantages:
Internal stage data does not traverse the public Internet.
Client and SaaS applications, such as Microsoft PowerBI, that run outside of the AWS VPC can connect to Snowflake securely.
Administrators are not required to modify firewall settings to access internal stage data.
Administrators can implement consistent security and monitoring regarding how users connect to storage accounts.
Configuring a VPC Interface Endpoint to Access Snowflake Internal Stages¶
To configure VPC interface endpoints to access Snowflake internal stages, it is necessary to have support from the following three roles in your organization:
The Snowflake account administrator (i.e. a user with the Snowflake ACCOUNTADMIN system role).
The AWS administrator.
The network administrator.
Depending on the organization, it may be necessary to coordinate the configuration efforts with more than one person or team to implement the following configuration steps.
AWS PrivateLink for S3 is an AWS service that must be enabled in your cloud environment.
For help with configuring and implementing this service, contact your internal AWS administrator.
Update the firewall allow-listing as follows:
If using an outbound firewall, ensure that it allows all the URLs required by Snowflake. For details, see SnowCD (Connectivity Diagnostic Tool).
us-east-1customers only: If using one of the following Snowflake clients to connect to Snowflake, please upgrade to the client version as follows:
JDBC driver: 3.13.3 (or higher)
ODBC driver: 2.23.2 (or higher)
Python Connector for Snowflake: 2.5.1 (or higher)
SnowSQL: 1.2.17 (or higher)
Complete the following steps to configure and implement secure access to Snowflake internal stages through VPC endpoint:
As a Snowflake account administrator, execute the following statements in your Snowflake account and record the value defined by the
privatelink_internal_stagekey. Note that the Amazon S3 bucket name is defined in the first segment of the URL when read from left to right. For more information, see ENABLE_INTERNAL_STAGES_PRIVATELINK and SYSTEM$GET_PRIVATELINK_CONFIG.
use role accountadmin; alter account set ENABLE_INTERNAL_STAGES_PRIVATELINK = true; select key, value from table(flatten(input=>parse_json(system$get_privatelink_config())));
As the AWS administrator, create a VPC endpoint for AWS PrivateLink for S3 using the AWS Console. Record the VPCE DNS Name for use in the next step; do not record any VPCE DNS zonal names.
The VPCE DNS Name can be found by describing an interface endpoint once the endpoint is created.
In this example, a wildcard (i.e.
*) is listed as the leading character in the VPCE DNS Name. Replace the leading wildcard with the Amazon S3 bucket name from the previous step. For example:
As the network administrator, update the DNS settings to resolve the following URL:
<bucket_name>.s3.<region>.amazonaws.comto the VPCE DNS name after the leading wildcard is replaced with the Amazon S3 bucket name.
In this example, resolve
Do not use wildcard characters (i.e.
*) with DNS mapping because of the possible impact of accessing other Amazon S3 buckets outside of Snowflake.
Use a separate Snowflake account for testing, and configure a private hosted DNS zone in a test VPC to test the feature so that the testing is isolated and does not impact your other workloads.
If using a separate Snowflake account is not possible, use a test user to access Snowflake from a test VPC where the DNS changes are made.
To test from on-premises applications, use DNS forwarding to forward requests to the AWS private hosted zone in the VPC where the DNS settings are made. If there are client applications in both the VPC and on-premises, use AWS Transit Gateway.
Execute the following command from the client machine to verify that the IP address returned is the private IP address for the storage account:
For Snowflake accounts in
us-east-1, verify your Snowflake clients are on their latest versions.