Moving data between Snowflake Postgres and Snowflake

pg_lake is a Postgres extension that enables data movement between Postgres and object storage formats like Parquet and Iceberg. With pg_lake, you can create Iceberg tables directly in Postgres, sync data to Snowflake, and exchange files between Snowflake Postgres and Snowflake.

Snowflake Postgres supports three primary patterns for data movement, each using a different integration type.

  • Shared Iceberg: Postgres writes Iceberg tables and Snowflake reads them through a catalog integration. Storage is managed automatically. No external S3 bucket is needed.
  • Stages: A managed storage area that enables bidirectional file exchange between Snowflake Postgres and Snowflake through Snowflake stages.
  • Customer-managed S3 storage: You provide your own S3 bucket and configure access through a storage integration. Postgres can write files and Iceberg tables to this bucket.
PatternIntegration typeWritesReadsFile formats
Shared IcebergCatalog integrationPG → SFSF onlyParquet (Iceberg tables)
StagesPostgres internal storage integrationPG → SF, SF → PGPG and SFParquet, CSV, JSON, and other formats supported by pg_lake
Customer-managed S3Postgres external storage integrationPG → SF, SF → PGPG and SFParquet, CSV, JSON, and other formats supported by pg_lake

Instance requirements

Data movement with pg_lake requires a STANDARD or HIGH MEMORY instance tier. BURSTABLE instances aren’t supported.

Warning

Do not downgrade a Postgres instance using pg_lake from a higher tier to BURSTABLE if data movement is still configured. If you do downgrade an instance, you will need to remove the tables in object storage first. Downgrading without cleanup may result in loss of data stored in object storage.

Shared Iceberg

With shared Iceberg, Postgres acts as the Iceberg catalog and Snowflake reads the tables through a catalog integration. Storage is handled by Snowflake, so you don’t need to configure an S3 bucket, IAM roles, or storage integrations. You also don’t need to create an external volume: Snowflake Postgres catalog integrations use vended credentials to access table data and metadata automatically.

This is the simplest way to make Postgres data available in Snowflake for analytics.

Considerations:

  • Tables created through a Snowflake Postgres catalog integration are read-only in Snowflake.
  • Auto-refresh uses metadata polling. Change notifications aren’t supported.
  • ACCESS_DELEGATION_MODE must be set to VENDED_CREDENTIALS. The EXTERNAL_VOLUME_CREDENTIALS mode isn’t supported.
  • Snowflake handles authentication with the Postgres instance automatically. You don’t need to specify REST_AUTHENTICATION.

Requirements

The role that creates the catalog integration must have CREATE INTEGRATION on the account and USAGE on the Postgres instance. For full access control details, see CREATE CATALOG INTEGRATION (Snowflake Postgres).

Set up shared Iceberg

  1. Connect to your Postgres instance and enable the pg_lake extension:

    CREATE EXTENSION pg_lake CASCADE;
  2. Create an Iceberg table in Postgres or copy data to this Iceberg table:

    CREATE TABLE my_table (
     id INT,
     data TEXT
      ) USING iceberg;
  3. In Snowflake, create a catalog integration that points to your Postgres instance:

    CREATE OR REPLACE CATALOG INTEGRATION my_postgres_catalog
      CATALOG_SOURCE = SNOWFLAKE_POSTGRES
      TABLE_FORMAT = ICEBERG
      CATALOG_NAMESPACE = 'public'
      REST_CONFIG = (
     POSTGRES_INSTANCE = '{instance_name}'
     CATALOG_NAME = 'postgres'
     ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
      )
      ENABLED = TRUE;
  4. Create an Iceberg table in Snowflake that references the Postgres table:

    CREATE OR REPLACE ICEBERG TABLE my_iceberg_table
      CATALOG = 'my_postgres_catalog'
      CATALOG_TABLE_NAME = 'my_table';

    You can optionally specify:

    • CATALOG_NAMESPACE: The Postgres schema containing the table. Defaults to the value set on the catalog integration.
    • CATALOG_NAME: The Postgres database containing the table. Defaults to the value set on the catalog integration.
    • AUTO_REFRESH = TRUE: Enables automatic metadata refresh by polling.

    You don’t need to specify an EXTERNAL_VOLUME.

  5. Query the data in Snowflake:

    SELECT * FROM my_iceberg_table;

Catalog-linked databases

Instead of creating individual Iceberg tables, you can create a catalog-linked database that automatically discovers and syncs tables from your Postgres database:

CREATE DATABASE my_postgres_db
  LINKED_CATALOG = (
    CATALOG = my_postgres_catalog
    ALLOWED_WRITE_OPERATIONS = NONE
  );

Snowflake automatically discovers namespaces and tables from the Postgres database and creates corresponding schemas and Iceberg tables. ALLOWED_WRITE_OPERATIONS = NONE is required because Postgres-managed Iceberg tables are read-only in Snowflake. For more information, see Use a catalog-linked database for Apache Iceberg™ tables.

Access control

To let other roles create Iceberg tables using the catalog integration, grant USAGE on the integration along with the standard Snowflake privileges for creating tables:

GRANT USAGE ON INTEGRATION my_postgres_catalog TO ROLE analyst_role;
GRANT USAGE ON DATABASE my_db TO ROLE analyst_role;
GRANT USAGE ON SCHEMA my_db.public TO ROLE analyst_role;
GRANT CREATE ICEBERG TABLE ON SCHEMA my_db.public TO ROLE analyst_role;

No additional grants on the Postgres instance are required beyond the initial USAGE grant to the catalog integration owner.

Refresh and auto-refresh

Snowflake reads Iceberg metadata from the Postgres catalog. Because the table is externally managed, you can refresh the data manually or enable automatic refresh.

To pull the latest data immediately, run a manual refresh:

ALTER ICEBERG TABLE my_iceberg_table REFRESH;

To enable continuous automatic refresh:

ALTER ICEBERG TABLE my_iceberg_table SET AUTO_REFRESH = TRUE;

You can configure how often Snowflake polls for changes on the catalog integration:

ALTER CATALOG INTEGRATION my_postgres_catalog SET REFRESH_INTERVAL_SECONDS = 60;

For a complete walkthrough, see the guide Sync Data from Snowflake Postgres to Snowflake with Iceberg and pg_lake.

Stages

Postgres stages provide a managed storage area for bidirectional file exchange between Snowflake Postgres and Snowflake. You can copy files from Postgres to a Snowflake stage, or put files in a stage from Snowflake and read them from Postgres.

A Snowflake Postgres internal storage integration allows you to read from and write to the managed storage associated with a Snowflake Postgres instance. This enables seamless data movement between Snowflake and your Postgres instances using an external stage. For the full SQL syntax, see CREATE STORAGE INTEGRATION (Postgres Internal Storage).

Considerations

  • Stage restrictions: When creating a stage (see CREATE STAGE) that uses a Postgres internal storage integration, the URL, ENCRYPTION, and DIRECTORY stage properties are disallowed.
  • Path configuration: The integration automatically appends a /files subpath to the base storage location retrieved from the Postgres instance.

Set up a Postgres stage

  1. In Snowflake, create a POSTGRES_INTERNAL_STORAGE storage integration that references your Postgres instance:

    CREATE STORAGE INTEGRATION my_pg_stage_integration
      TYPE = POSTGRES_INTERNAL_STORAGE
      POSTGRES_INSTANCE = 'my_postgres_instance';

    Where:

    • TYPE = POSTGRES_INTERNAL_STORAGE: Specifies the integration type for Postgres managed storage. Required.
    • POSTGRES_INSTANCE = '<instance_name>': Specifies the name of the Snowflake Postgres instance. Required.
  2. Create a Snowflake stage that uses the integration. Use the RELATIVE_URL property to specify a user-chosen path prefix under the Postgres managed storage:

    CREATE STAGE my_pg_stage
      RELATIVE_URL = '/my_data'
      STORAGE_INTEGRATION = my_pg_stage_integration;
  3. Connect to your Postgres instance and enable the pg_lake extension:

    CREATE EXTENSION pg_lake CASCADE;

You can now exchange files between Postgres and Snowflake through the stage. Use standard Snowflake stage operations (LIST, GET, PUT) on the Snowflake side, and pg_lake file operations on the Postgres side.

Move data between Snowflake and Postgres

You can use the COPY FILES command to move data between Snowflake stages and your Postgres managed storage. These examples assume you have already created a separate Snowflake stage in addition to the Postgres stage created above.

Snowflake to Postgres (write)

To move data from an existing Snowflake stage into the Postgres managed storage:

COPY FILES
  INTO @<postgres_stage>
  FROM @<source_stage>;

Where <source_stage> refers to a pre-existing Snowflake stage that contains the files you want to move, and <postgres_stage> is the Postgres stage created in the previous step.

On the Postgres side, you can then load the data:

COPY test FROM '@STAGE/my_data/data*.parquet';

Where @STAGE is a preconfigured placeholder for the stage location on the Postgres side.

Postgres to Snowflake (read)

To move data written by Postgres back into an existing Snowflake stage:

COPY FILES
  INTO @<destination_stage>
  FROM @<postgres_stage>;

Where <destination_stage> refers to a pre-existing Snowflake stage where you want to copy the files, and <postgres_stage> is the Postgres stage created earlier.

Access control for stages

  • Creating the integration: The role that creates the storage integration must have OWNERSHIP on the Postgres instance and CREATE INTEGRATION on the account.
  • Usage: To use the integration to create a stage, a role must have the USAGE privilege on the integration object.
  • Stage privileges: Users need standard privileges (for example, READ, WRITE) on the stage itself to perform data movement operations.

For example, to grant another role the ability to create stages using the storage integration:

GRANT USAGE ON INTEGRATION my_pg_stage_integration TO ROLE data_engineer_role;

Customer-managed S3 storage

If you need to use your own S3 bucket, you can create a POSTGRES_EXTERNAL_STORAGE storage integration and attach it to your Postgres instance. This gives pg_lake read and write access to your bucket for files and Iceberg tables.

Prerequisites

Before configuring S3 storage for pg_lake, ensure that you have:

  • An active AWS account with permissions to create and manage S3 buckets and IAM roles.

  • An S3 bucket in the same AWS region as your Snowflake account. To determine your Snowflake account region, execute the following query in Snowflake (not on your Postgres instance):

    SELECT CURRENT_REGION();
  • Familiarity with AWS IAM roles and policies.

  • A Snowflake Postgres instance with pg_lake support.

  • Privileges to create storage integrations in Snowflake (requires ACCOUNTADMIN role or a role with the CREATE INTEGRATION privilege on the account).

Create an S3 bucket

If you don’t already have one, create an S3 bucket in the same AWS region as your Snowflake account. For example, if your Snowflake account is in us-west-2, create the S3 bucket in the us-west-2 region.

See the AWS documentation for instructions on creating an S3 bucket.

Create an IAM policy for S3 access

Create an IAM policy that grants the necessary permissions for pg_lake to read from and write to your S3 bucket:

  1. Sign in to the AWS Management Console and navigate to the IAM service.

  2. From the left-hand navigation pane, select Account settings.

  3. Under Security Token Service (STS) in the Endpoints list, find the Snowflake region where your account is located. If the STS status is inactive, move the toggle to Active. For more information, see Activating and deactivating AWS STS in an AWS region.

  4. From the left-hand navigation pane, select Policies, then choose Create policy.

  5. For Policy editor, select JSON.

  6. Add a policy document that allows Snowflake to access the S3 bucket and folder. Replace bucket_name and prefix with your actual bucket name and folder path prefix:

    {
     "Version": "2012-10-17",
     "Statement": [
         {
             "Effect": "Allow",
             "Action": [
                 "s3:PutObject",
                 "s3:GetObject",
                 "s3:GetObjectVersion",
                 "s3:DeleteObject",
                 "s3:DeleteObjectVersion"
             ],
             "Resource": "arn:aws:s3:::bucket_name/prefix/*"
         },
         {
             "Effect": "Allow",
             "Action": [
                 "s3:ListBucket",
                 "s3:GetBucketLocation"
             ],
             "Resource": "arn:aws:s3:::bucket_name",
             "Condition": {
                 "StringLike": {
                     "s3:prefix": [
                         "prefix/*"
                     ]
                 }
             }
         }
     ]
    }

    This policy provides permissions to:

    • Read, write, and delete objects in the specified S3 path
    • List bucket contents and retrieve bucket location
    • Support pg_lake’s ability to create and manage Iceberg tables
  7. Choose Next.

  8. Enter a policy name (for example, snowflake_pg_lake_access) and an optional description.

  9. Choose Create policy.

Create an IAM role

Create an IAM role that Snowflake will assume to access your S3 bucket.

  1. From the left-hand navigation pane in the Identity and Access Management (IAM) Dashboard, select Roles.
  2. Select Create role.
  3. Select AWS account as the trusted entity type.
  4. Select Another AWS account.
  5. In the Account ID field, enter your own AWS account ID temporarily. You will modify the trust relationship in a later step to grant access to Snowflake.
  6. Select the Require external ID option. Enter a placeholder external ID such as 0000. You will update this with the actual external ID generated by Snowflake in a later step. An external ID is used to grant access to your AWS resources (such as S3 buckets) to a third party like Snowflake. For more information, see How to use an external ID when granting access to your AWS resources to a third party.
  7. Select Next.
  8. Search for and select the policy you created in Create an IAM policy for S3 access.
  9. Select Next.
  10. Enter a name and description for the role (for example, snowflake_pg_lake_role), then select Create role.
  11. On the role summary page, locate and record the Role ARN value. You will need this when creating the storage integration in Snowflake.
  12. While on the role summary page, select Edit in the summary section and change the Maximum session duration to 12 hours. Select Save changes. For more information, see Modifying a role maximum session duration (AWS).

Create a storage integration in Snowflake

Create a storage integration object in Snowflake that references the IAM role you created. For the full command syntax, see CREATE STORAGE INTEGRATION.

CREATE STORAGE INTEGRATION my_pg_lake_integration
  TYPE = POSTGRES_EXTERNAL_STORAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/snowflake_pg_lake_role'
  STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/my-prefix/');

Where:

  • my_pg_lake_integration is the name you choose for the storage integration.
  • TYPE = POSTGRES_EXTERNAL_STORAGE specifies that this integration is for use with Snowflake Postgres.
  • STORAGE_AWS_ROLE_ARN is the Role ARN you recorded in Create an IAM role.
  • STORAGE_ALLOWED_LOCATIONS specifies the S3 bucket and path prefix. Replace my-bucket and my-prefix with the bucket name and folder path you created in Create an S3 bucket. Note that only one location is allowed for Postgres storage integrations.

Retrieve the Snowflake IAM user ARN and external ID

After creating the storage integration, use the DESCRIBE INTEGRATION command to retrieve the AWS IAM user and external ID that Snowflake generated for this integration:

DESCRIBE STORAGE INTEGRATION my_pg_lake_integration;

In the output, locate and record the following values:

  • STORAGE_AWS_IAM_USER_ARN: The IAM user ARN that Snowflake will use to assume the role
  • STORAGE_AWS_EXTERNAL_ID: The external ID to use in the trust policy

You will use these values in the next step to configure the IAM role trust policy.

Update the IAM role trust policy

Update the trust policy of the IAM role you created in Create an IAM role to allow Snowflake to assume the role:

  1. Sign in to the AWS Management Console and navigate to the IAM service.

  2. From the left-hand navigation pane, select Roles.

  3. Select the role you created in Create an IAM role.

  4. Select the Trust relationships tab.

  5. Select Edit trust policy.

  6. Replace the policy document with the following text:

    {
     "Version": "2012-10-17",
     "Statement": [
         {
             "Effect": "Allow",
             "Principal": {
                 "AWS": "<storage_aws_iam_user_arn>"
             },
             "Action": "sts:AssumeRole",
             "Condition": {
                 "StringEquals": {
                     "sts:ExternalId": "<storage_aws_external_id>"
                 }
             }
         }
     ]
    }

    Replace the placeholder values with the values you recorded in Retrieve the Snowflake IAM user ARN and external ID:

    • Replace storage_aws_iam_user_arn with the STORAGE_AWS_IAM_USER_ARN value. This is a full ARN in the form arn:aws:iam::<account_id>:user/snowflake-postgres-integration-management, where the username is always the same and only the AWS account ID varies.
    • Replace storage_aws_external_id with the STORAGE_AWS_EXTERNAL_ID value.
  7. Select Update policy to save the changes.

Attach the storage integration to your Postgres instance

Attach the storage integration to your Snowflake Postgres instance. When the storage integration is attached, the S3 credentials are automatically synchronized to the Postgres control plane and made available to pg_lake:

ALTER POSTGRES INSTANCE my_postgres_instance
  SET STORAGE_INTEGRATION = my_pg_lake_integration;

You can also specify the storage integration when creating a new Postgres instance:

CREATE POSTGRES INSTANCE my_postgres_instance
  ...
  STORAGE_INTEGRATION = my_pg_lake_integration;

To remove a storage integration from a Postgres instance:

ALTER POSTGRES INSTANCE my_postgres_instance
  UNSET STORAGE_INTEGRATION;

Configure and use pg_lake

After attaching the storage integration, connect to your Postgres instance and configure pg_lake. For a list of available extensions, see Snowflake Postgres Extensions.

  1. Create the pg_lake extension:

    CREATE EXTENSION pg_lake CASCADE;
  2. Set the default storage location for Iceberg tables. This should match the location specified in your storage integration.

    The SET command only applies to the current session:

    SET pg_lake_iceberg.default_location_prefix = 's3://my-bucket/my-prefix';

    To set the value for all current and future sessions, use the ALTER DATABASE command instead. If you use multiple Postgres databases, make sure to set the storage location for each database:

    -- Substitute the name of your database
    ALTER DATABASE my_database SET pg_lake_iceberg.default_location_prefix = 's3://my-bucket/my-prefix';
  3. Verify that the storage integration is configured correctly by listing the contents of your S3 bucket:

    SELECT * FROM lake_file.list('s3://my-bucket/my-prefix/*');

    Replace my-bucket and my-prefix with your actual bucket name and path. If the configuration is correct, this query returns a list of files at that location. If the bucket is empty, the query returns an empty result set without an error.

  4. Verify the end-to-end configuration by creating an Iceberg table, inserting data, and querying it back. If this succeeds, pg_lake can read from and write to your S3 bucket:

    CREATE TABLE my_table (
     id INT,
     data TEXT
      ) USING iceberg;
    
    INSERT INTO my_table VALUES (1, 'hello iceberg');
    
    SELECT * FROM my_table;

For a complete walkthrough using customer-managed S3 with pg_lake, see the guide Bidirectional Data Pipelines with pg_lake and Snowflake.

Security considerations

When configuring S3 access for pg_lake, keep these security best practices in mind:

  • Use IAM roles: Snowflake Postgres uses IAM role assumption rather than static credentials, providing better security through temporary credentials and automatic credential rotation.
  • Limit IAM permissions: Grant only the minimum necessary permissions to the S3 bucket paths that pg_lake needs to access. The IAM policy should restrict access to specific bucket prefixes.
  • Monitor external ID: The external ID in the trust policy ensures that only your Snowflake account can assume the IAM role.
  • Review storage integration changes: Any updates to the storage integration’s STORAGE_AWS_ROLE_ARN or STORAGE_ALLOWED_LOCATIONS are automatically synchronized to the Postgres instance.
  • Use bucket policies: Consider using S3 bucket policies in addition to IAM policies for defense in depth.
  • Enable S3 access logging: Enable access logging on your S3 bucket to monitor and audit access patterns.
  • Regional alignment: Ensure your S3 bucket is in the same AWS region as your Snowflake account for optimal performance and to meet data residency requirements.

Support for spatial types

Geometry and geography columns move from Postgres to Iceberg in the form of “well-known binary” BYTE columns. To work with them using Snowflake spatial functionality, once the data is in Iceberg, use the ST_GeometryFromWKB or ST_GeographyFromWKB functions to convert the raw bytes into a Geometry or Geography object Snowflake can use.

Note that if your Postgres Geometry column contains specialized types like CircularString or CurvePolygon, Snowflake will not be able to read them from the Iceberg table, as Snowflake supports Point, LineString and Polygon only.