Moving data between Snowflake Postgres and Snowflake¶
pg_lake is a Postgres extension that enables data movement between Postgres and object storage formats like Parquet and Iceberg. With pg_lake, you can create Iceberg tables directly in Postgres, sync data to Snowflake, and exchange files between Snowflake Postgres and Snowflake.
Snowflake Postgres supports three primary patterns for data movement, each using a different integration type.
- Shared Iceberg: Postgres writes Iceberg tables and Snowflake reads them through a catalog integration. Storage is managed automatically. No external S3 bucket is needed.
- Stages: A managed storage area that enables bidirectional file exchange between Snowflake Postgres and Snowflake through Snowflake stages.
- Customer-managed S3 storage: You provide your own S3 bucket and configure access through a storage integration. Postgres can write files and Iceberg tables to this bucket.
| Pattern | Integration type | Writes | Reads | File formats |
|---|---|---|---|---|
| Shared Iceberg | Catalog integration | PG → SF | SF only | Parquet (Iceberg tables) |
| Stages | Postgres internal storage integration | PG → SF, SF → PG | PG and SF | Parquet, CSV, JSON, and other formats supported by pg_lake |
| Customer-managed S3 | Postgres external storage integration | PG → SF, SF → PG | PG and SF | Parquet, CSV, JSON, and other formats supported by pg_lake |
Instance requirements¶
Data movement with pg_lake requires a STANDARD or HIGH MEMORY instance tier. BURSTABLE instances aren’t supported.
Warning
Do not downgrade a Postgres instance using pg_lake from a higher tier to BURSTABLE if data movement is still configured. If you do downgrade an instance, you will need to remove the tables in object storage first. Downgrading without cleanup may result in loss of data stored in object storage.
Shared Iceberg¶
With shared Iceberg, Postgres acts as the Iceberg catalog and Snowflake reads the tables through a catalog integration. Storage is handled by Snowflake, so you don’t need to configure an S3 bucket, IAM roles, or storage integrations. You also don’t need to create an external volume: Snowflake Postgres catalog integrations use vended credentials to access table data and metadata automatically.
This is the simplest way to make Postgres data available in Snowflake for analytics.
Considerations:
- Tables created through a Snowflake Postgres catalog integration are read-only in Snowflake.
- Auto-refresh uses metadata polling. Change notifications aren’t supported.
ACCESS_DELEGATION_MODEmust be set toVENDED_CREDENTIALS. TheEXTERNAL_VOLUME_CREDENTIALSmode isn’t supported.- Snowflake handles authentication with the Postgres instance automatically. You don’t need to
specify
REST_AUTHENTICATION.
Requirements¶
The role that creates the catalog integration must have CREATE INTEGRATION on the account and
USAGE on the Postgres instance. For full access control details, see
CREATE CATALOG INTEGRATION (Snowflake Postgres).
Set up shared Iceberg¶
-
Connect to your Postgres instance and enable the pg_lake extension:
-
Create an Iceberg table in Postgres or copy data to this Iceberg table:
-
In Snowflake, create a catalog integration that points to your Postgres instance:
-
Create an Iceberg table in Snowflake that references the Postgres table:
You can optionally specify:
CATALOG_NAMESPACE: The Postgres schema containing the table. Defaults to the value set on the catalog integration.CATALOG_NAME: The Postgres database containing the table. Defaults to the value set on the catalog integration.AUTO_REFRESH = TRUE: Enables automatic metadata refresh by polling.
You don’t need to specify an
EXTERNAL_VOLUME. -
Query the data in Snowflake:
Catalog-linked databases¶
Instead of creating individual Iceberg tables, you can create a catalog-linked database that automatically discovers and syncs tables from your Postgres database:
Snowflake automatically discovers namespaces and tables from the Postgres database and creates
corresponding schemas and Iceberg tables. ALLOWED_WRITE_OPERATIONS = NONE is required because
Postgres-managed Iceberg tables are read-only in Snowflake. For more information, see
Use a catalog-linked database for Apache Iceberg™ tables.
Access control¶
To let other roles create Iceberg tables using the catalog integration, grant USAGE on the
integration along with the standard Snowflake privileges for creating tables:
No additional grants on the Postgres instance are required beyond the initial USAGE grant
to the catalog integration owner.
Refresh and auto-refresh¶
Snowflake reads Iceberg metadata from the Postgres catalog. Because the table is externally managed, you can refresh the data manually or enable automatic refresh.
To pull the latest data immediately, run a manual refresh:
To enable continuous automatic refresh:
You can configure how often Snowflake polls for changes on the catalog integration:
For a complete walkthrough, see the guide Sync Data from Snowflake Postgres to Snowflake with Iceberg and pg_lake.
Stages¶
Postgres stages provide a managed storage area for bidirectional file exchange between Snowflake Postgres and Snowflake. You can copy files from Postgres to a Snowflake stage, or put files in a stage from Snowflake and read them from Postgres.
A Snowflake Postgres internal storage integration allows you to read from and write to the managed storage associated with a Snowflake Postgres instance. This enables seamless data movement between Snowflake and your Postgres instances using an external stage. For the full SQL syntax, see CREATE STORAGE INTEGRATION (Postgres Internal Storage).
Considerations¶
- Stage restrictions: When creating a stage (see CREATE STAGE)
that uses a Postgres internal storage integration, the
URL,ENCRYPTION, andDIRECTORYstage properties are disallowed. - Path configuration: The integration automatically appends a
/filessubpath to the base storage location retrieved from the Postgres instance.
Set up a Postgres stage¶
-
In Snowflake, create a
POSTGRES_INTERNAL_STORAGEstorage integration that references your Postgres instance:Where:
TYPE = POSTGRES_INTERNAL_STORAGE: Specifies the integration type for Postgres managed storage. Required.POSTGRES_INSTANCE = '<instance_name>': Specifies the name of the Snowflake Postgres instance. Required.
-
Create a Snowflake stage that uses the integration. Use the
RELATIVE_URLproperty to specify a user-chosen path prefix under the Postgres managed storage: -
Connect to your Postgres instance and enable the pg_lake extension:
You can now exchange files between Postgres and Snowflake through the stage. Use standard Snowflake stage operations (LIST, GET, PUT) on the Snowflake side, and pg_lake file operations on the Postgres side.
Move data between Snowflake and Postgres¶
You can use the COPY FILES command to move data between Snowflake stages and your Postgres
managed storage. These examples assume you have already created a separate Snowflake stage
in addition to the Postgres stage created above.
Snowflake to Postgres (write)
To move data from an existing Snowflake stage into the Postgres managed storage:
Where <source_stage> refers to a pre-existing Snowflake stage that contains the files you want
to move, and <postgres_stage> is the Postgres stage created in the previous step.
On the Postgres side, you can then load the data:
Where @STAGE is a preconfigured placeholder for the stage location on the Postgres side.
Postgres to Snowflake (read)
To move data written by Postgres back into an existing Snowflake stage:
Where <destination_stage> refers to a pre-existing Snowflake stage where you want to copy
the files, and <postgres_stage> is the Postgres stage created earlier.
Access control for stages¶
- Creating the integration: The role that creates the storage integration must have
OWNERSHIPon the Postgres instance andCREATE INTEGRATIONon the account. - Usage: To use the integration to create a stage, a role must have the
USAGEprivilege on the integration object. - Stage privileges: Users need standard privileges (for example,
READ,WRITE) on the stage itself to perform data movement operations.
For example, to grant another role the ability to create stages using the storage integration:
Customer-managed S3 storage¶
If you need to use your own S3 bucket, you can create a POSTGRES_EXTERNAL_STORAGE storage
integration and attach it to your Postgres instance. This gives pg_lake read and write access
to your bucket for files and Iceberg tables.
Prerequisites¶
Before configuring S3 storage for pg_lake, ensure that you have:
-
An active AWS account with permissions to create and manage S3 buckets and IAM roles.
-
An S3 bucket in the same AWS region as your Snowflake account. To determine your Snowflake account region, execute the following query in Snowflake (not on your Postgres instance):
-
Familiarity with AWS IAM roles and policies.
-
A Snowflake Postgres instance with pg_lake support.
-
Privileges to create storage integrations in Snowflake (requires ACCOUNTADMIN role or a role with the CREATE INTEGRATION privilege on the account).
Create an S3 bucket¶
If you don’t already have one, create an S3 bucket in the same AWS region as your Snowflake account.
For example, if your Snowflake account is in us-west-2, create the S3 bucket in the
us-west-2 region.
See the AWS documentation for instructions on creating an S3 bucket.
Create an IAM policy for S3 access¶
Create an IAM policy that grants the necessary permissions for pg_lake to read from and write to your S3 bucket:
-
Sign in to the AWS Management Console and navigate to the IAM service.
-
From the left-hand navigation pane, select Account settings.
-
Under Security Token Service (STS) in the Endpoints list, find the Snowflake region where your account is located. If the STS status is inactive, move the toggle to Active. For more information, see Activating and deactivating AWS STS in an AWS region.
-
From the left-hand navigation pane, select Policies, then choose Create policy.
-
For Policy editor, select JSON.
-
Add a policy document that allows Snowflake to access the S3 bucket and folder. Replace
bucket_nameandprefixwith your actual bucket name and folder path prefix:This policy provides permissions to:
- Read, write, and delete objects in the specified S3 path
- List bucket contents and retrieve bucket location
- Support pg_lake’s ability to create and manage Iceberg tables
-
Choose Next.
-
Enter a policy name (for example,
snowflake_pg_lake_access) and an optional description. -
Choose Create policy.
Create an IAM role¶
Create an IAM role that Snowflake will assume to access your S3 bucket.
- From the left-hand navigation pane in the Identity and Access Management (IAM) Dashboard, select Roles.
- Select Create role.
- Select AWS account as the trusted entity type.
- Select Another AWS account.
- In the Account ID field, enter your own AWS account ID temporarily. You will modify the trust relationship in a later step to grant access to Snowflake.
- Select the Require external ID option. Enter a placeholder external ID such as
0000. You will update this with the actual external ID generated by Snowflake in a later step. An external ID is used to grant access to your AWS resources (such as S3 buckets) to a third party like Snowflake. For more information, see How to use an external ID when granting access to your AWS resources to a third party. - Select Next.
- Search for and select the policy you created in Create an IAM policy for S3 access.
- Select Next.
- Enter a name and description for the role (for example,
snowflake_pg_lake_role), then select Create role. - On the role summary page, locate and record the Role ARN value. You will need this when creating the storage integration in Snowflake.
- While on the role summary page, select Edit in the summary section and change the
Maximum session duration to
12 hours. Select Save changes. For more information, see Modifying a role maximum session duration (AWS).
Create a storage integration in Snowflake¶
Create a storage integration object in Snowflake that references the IAM role you created. For the full command syntax, see CREATE STORAGE INTEGRATION.
Where:
my_pg_lake_integrationis the name you choose for the storage integration.TYPE = POSTGRES_EXTERNAL_STORAGEspecifies that this integration is for use with Snowflake Postgres.STORAGE_AWS_ROLE_ARNis the Role ARN you recorded in Create an IAM role.STORAGE_ALLOWED_LOCATIONSspecifies the S3 bucket and path prefix. Replacemy-bucketandmy-prefixwith the bucket name and folder path you created in Create an S3 bucket. Note that only one location is allowed for Postgres storage integrations.
Retrieve the Snowflake IAM user ARN and external ID¶
After creating the storage integration, use the DESCRIBE INTEGRATION command to retrieve the AWS IAM user and external ID that Snowflake generated for this integration:
In the output, locate and record the following values:
STORAGE_AWS_IAM_USER_ARN: The IAM user ARN that Snowflake will use to assume the roleSTORAGE_AWS_EXTERNAL_ID: The external ID to use in the trust policy
You will use these values in the next step to configure the IAM role trust policy.
Update the IAM role trust policy¶
Update the trust policy of the IAM role you created in Create an IAM role to allow Snowflake to assume the role:
-
Sign in to the AWS Management Console and navigate to the IAM service.
-
From the left-hand navigation pane, select Roles.
-
Select the role you created in Create an IAM role.
-
Select the Trust relationships tab.
-
Select Edit trust policy.
-
Replace the policy document with the following text:
Replace the placeholder values with the values you recorded in Retrieve the Snowflake IAM user ARN and external ID:
- Replace
storage_aws_iam_user_arnwith theSTORAGE_AWS_IAM_USER_ARNvalue. This is a full ARN in the formarn:aws:iam::<account_id>:user/snowflake-postgres-integration-management, where the username is always the same and only the AWS account ID varies. - Replace
storage_aws_external_idwith theSTORAGE_AWS_EXTERNAL_IDvalue.
- Replace
-
Select Update policy to save the changes.
Attach the storage integration to your Postgres instance¶
Attach the storage integration to your Snowflake Postgres instance. When the storage integration is attached, the S3 credentials are automatically synchronized to the Postgres control plane and made available to pg_lake:
You can also specify the storage integration when creating a new Postgres instance:
To remove a storage integration from a Postgres instance:
Configure and use pg_ lake¶
After attaching the storage integration, connect to your Postgres instance and configure pg_lake. For a list of available extensions, see Snowflake Postgres Extensions.
-
Create the pg_lake extension:
-
Set the default storage location for Iceberg tables. This should match the location specified in your storage integration.
The SET command only applies to the current session:
To set the value for all current and future sessions, use the ALTER DATABASE command instead. If you use multiple Postgres databases, make sure to set the storage location for each database:
-
Verify that the storage integration is configured correctly by listing the contents of your S3 bucket:
Replace
my-bucketandmy-prefixwith your actual bucket name and path. If the configuration is correct, this query returns a list of files at that location. If the bucket is empty, the query returns an empty result set without an error. -
Verify the end-to-end configuration by creating an Iceberg table, inserting data, and querying it back. If this succeeds, pg_lake can read from and write to your S3 bucket:
For a complete walkthrough using customer-managed S3 with pg_lake, see the guide Bidirectional Data Pipelines with pg_lake and Snowflake.
Security considerations¶
When configuring S3 access for pg_lake, keep these security best practices in mind:
- Use IAM roles: Snowflake Postgres uses IAM role assumption rather than static credentials, providing better security through temporary credentials and automatic credential rotation.
- Limit IAM permissions: Grant only the minimum necessary permissions to the S3 bucket paths that pg_lake needs to access. The IAM policy should restrict access to specific bucket prefixes.
- Monitor external ID: The external ID in the trust policy ensures that only your Snowflake account can assume the IAM role.
- Review storage integration changes: Any updates to the storage integration’s
STORAGE_AWS_ROLE_ARNorSTORAGE_ALLOWED_LOCATIONSare automatically synchronized to the Postgres instance. - Use bucket policies: Consider using S3 bucket policies in addition to IAM policies for defense in depth.
- Enable S3 access logging: Enable access logging on your S3 bucket to monitor and audit access patterns.
- Regional alignment: Ensure your S3 bucket is in the same AWS region as your Snowflake account for optimal performance and to meet data residency requirements.
Support for spatial types¶
Geometry and geography columns move from Postgres to Iceberg in the form of “well-known binary” BYTE columns. To work with them using Snowflake spatial functionality, once the data is in Iceberg, use the ST_GeometryFromWKB or ST_GeographyFromWKB functions to convert the raw bytes into a Geometry or Geography object Snowflake can use.
Note that if your Postgres Geometry column contains specialized types like CircularString or CurvePolygon, Snowflake will not be able to read them from the Iceberg table, as Snowflake supports Point, LineString and Polygon only.
Related information¶
- Sync Data from Snowflake Postgres to Snowflake with Iceberg and pg_lake — Step-by-step developer guide with incremental sync using pg_incremental
- Bidirectional Data Movement between SPG and SF with pg_lake - Internal Stage — — End-to-end quickstart using internal stages and pg_lake
- Bidirectional Data Movement between SPG and SF with pg_lake - External Stage — End-to-end quickstart using customer-managed S3, pg_lake, and Snowflake Cortex for IoT anomaly detection
- Option 1: Configure a Snowflake storage integration to access Amazon S3 — Similar S3 access workflow to the one described in this topic
- Apache Iceberg™ tables — Overview of Iceberg table support in Snowflake
- Create an Apache Iceberg™ table in Snowflake — Creating Iceberg tables from different catalog sources
- Configure an external volume — Configuring an external volume for Iceberg tables
- Configure a catalog integration for files in object storage — Catalog integration setup for files in object storage
- pg_lake extension documentation