Replicating Databases Across Multiple Accounts

This topic describes the steps necessary to replicate databases across multiple Snowflake accounts in different regions and keeping the database objects and stored data synchronized.

In this Topic:

Region Support for Database Replication and Failover/Failback

All Snowflake regions across Amazon Web Services, Google Cloud Platform, and Microsoft Azure support Database Replication and Failover/Failback, with the exception of US Gov Virginia on Microsoft Azure.

Note that accounts can replicate databases between Virtual Private Snowflake (VPS) and multi-tenant regions to facilitate data sharing and account migrations between these regions. This ability is disabled by default. VPS accounts must contact Snowflake Support to enable access.

Web Interface for Database Replication and Failover/Failback

Account administrators (users with the ACCOUNTADMIN role) can use the Replication area of the Databases Databases tab tab in the Snowflake web interface to perform most actions related to configuring and managing database replication, including the following actions:

  • Promote a local database to serve as a primary database.

  • Enable failover for a primary database (Business Critical Edition accounts (or higher).

  • Refresh a secondary database, either once (manually) or repeatedly (on a schedule, using a task).

  • Promote a secondary database to serve as a primary database (Business Critical Edition accounts (or higher)).

  • Disable replication and/or failover for a primary database.

Replicating a Database to Another Account

The instructions in this section explain how to prepare your accounts for replication, promote a local database to serve as a primary database, and perform the initial replication of this primary database to another account.

Important

Target accounts do not have Tri-Secret Secure or AWS PrivateLink configured as a default. If Tri-Secret Secure or PrivateLink is required for compliance, security or other purposes, it is your responsibility to ensure that those features are configured in the target account.

Step 2: Viewing All Accounts in Your Organization

Retrieve the list of accounts in your organization. Any existing permanent or transient database in these accounts can be modified to serve as a primary database. Replicas of a primary database (i.e. secondary databases) can only be created in these accounts.

Note

Only account administrators (users with the ACCOUNTADMIN role) can execute the SQL statement in this section.

To view the list of accounts in your organization, query SHOW REPLICATION ACCOUNTS.

SHOW REPLICATION ACCOUNTS;

+------------------+---------------------------------+---------------+------------+
| snowflake_region | created_on                      | name          | comment    |
|------------------+---------------------------------+---------------+------------|
| AWS_US_WEST_2    | 2018-11-19 16:11:12.720 -0700   | MYACCOUNT1    |            |
| AWS_US_EAST_1    | 2019-06-02 14:12:23.192 -0700   | MYACCOUNT2    |            |
+------------------+---------------------------------+---------------+------------+

The following table displays the complete list of Snowflake Region IDs:

Snowflake Region IDs

Region

Region ID

Snowflake Region ID

Notes

Amazon Web Services (AWS)

US West (Oregon)

us-west-2

aws_us_west_2

US East (Ohio)

us-east-2.aws

aws_us_east_2

US East (N. Virginia)

us-east-1

aws_us_east_1

US East (Commercial Gov - N. Virginia)

us-east-1-gov.aws

aws_us_east_1_gov

Available only for accounts on Business Critical (or higher); not located in AWS GovCloud (US), which is a separate, dedicated cloud not yet supported by Snowflake.

Canada (Central)

ca-central-1.aws

aws_ca_central_1

EU (Ireland)

eu-west-1

aws_eu_west_1

EU (Frankfurt)

eu-central-1

aws_eu_central_1

Asia Pacific (Tokyo)

ap-northeast-1.aws

aws_ap_northeast_1

Asia Pacific (Mumbai)

ap-south-1.aws

aws_ap_south_1

Asia Pacific (Singapore)

ap-southeast-1

aws_ap_southeast_1

Asia Pacific (Sydney)

ap-southeast-2

aws_ap_southeast_2

Google Cloud Platform (GCP)

US Central1 (Iowa)

us-central1.gcp

gcp_us_central1

Europe West2 (London)

europe-west2.gcp

gcp_europe_west2

Europe West4 (Netherlands)

europe-west4.gcp

gcp_europe_west4

Microsoft Azure

West US 2 (Washington)

west-us-2.azure

azure_westus2

East US 2 (Virginia)

east-us-2.azure

azure_eastus2

US Gov Virginia

us-gov-virginia.azure

azure_usgovvirginia

Available only for accounts on Business Critical (or higher).

Canada Central (Toronto)

canada-central.azure

azure_canadacentral

West Europe (Netherlands)

west-europe.azure

azure_westeurope

Southeast Asia (Singapore)

southeast-asia.azure

azure_southeastasia

Switzerland North (Zurich)

switzerland-north.azure

azure_switzerlandnorth

Australia East (New South Wales)

australia-east.azure

azure_australiaeast

Step 3: Promoting a Local Database to Serve as a Primary Database

Modify an existing permanent or transient database to serve as a primary database using an ALTER DATABASE … ENABLE REPLICATION TO ACCOUNTS statement. Provide a comma-separated list of accounts in your organization that can store a replica of this database (i.e. a secondary database), allowing users in those accounts to query objects in the secondary database.

Note

Only account administrators (users with the ACCOUNTADMIN role) can execute the SQL statement in this section.

Example

Promote local database mydb1 (in region aws_us_west_2) to serve as a primary database and specify that accounts myaccount2 and myaccount3 (in regions aws_us_east_1 (AWS) and azure_westeurope (Azure), respectively) can each store a replica of this database:

ALTER DATABASE mydb1 ENABLE REPLICATION TO ACCOUNTS aws_us_east_1.myaccount2, azure_westeurope.myaccount3;

Step 4: Enabling Failover for a Primary Database

Note

Complete this step only if you plan to configure this primary database to fail over to another account. For information, see Introduction to Business Continuity and Disaster Recovery.

Enable failover for a primary database to one or more accounts in your organization using an ALTER DATABASE … ENABLE FAILOVER TO ACCOUNTS statement. The replica of this primary database in any one of these accounts (i.e. a secondary database) can be promoted to serve as the primary database.

Note that enabling failover for a primary database can be done either before or after a replica of the primary database has been created in a specified account.

Example

Enable failover for primary database mydb1 (in region aws_us_west_2) to accounts myaccount2 and myaccount3 (in regions aws_us_east_1 (AWS) and azure_westeurope (Azure), respectively). In this example, suppose the primary database is stored in the myaccount1 account. The ALTER DATABASE command must be executed in that account:

ALTER DATABASE mydb1 ENABLE FAILOVER TO ACCOUNTS aws_us_east_1.myaccount2, azure_westeurope.myaccount3;

Step 5: Creating a Secondary Database

Create a replica of an existing primary database in the same account that stores the primary database, or a different account (in the same or a different region). Note that you can only create a secondary database in an account specified in the ALTER DATABASE … ENABLE REPLICATION TO ACCOUNTS statement in Step 3: Promoting a Local Database to Serve as a Primary Database.

Execute a CREATE DATABASE … AS REPLICA OF statement in each target account to create a replica of the specified primary database.

Important

As a best practice, we recommend giving each secondary database the same name as its primary database. This practice supports referencing fully-qualified objects (i.e. '<db>.<schema>.<object>') by other objects in the same database, such as querying a fully-qualified table name in a view.

If a secondary database has a different name from the primary database, then these object references would break in the secondary database.

To view the list of primary and secondary databases in your organization, query SHOW REPLICATION DATABASES.

Note

Only account administrators (users with the ACCOUNTADMIN role) can execute the SQL statement in this section.

After a secondary database is created, an account administrator can transfer ownership of the database to another role (using GRANT OWNERSHIP.)

Example

The following example creates a replica of the aws_us_west_2.myaccount1.mydb1 primary database in the aws_us_east_1.myaccount2 account, with automatic refreshing of materialized views in the replica enabled. The SQL statement is executed in the same AWS region group (public) but a different region from the account that stores the primary database:

-- Log into the AWS_US_EAST_1.MYACCOUNT2 account.

-- Query the set of primary and secondary databases in your organization.
-- In this example, the AWS_US_WEST_2.MYACCOUNT1 primary database is available to replicate.
SHOW REPLICATION DATABASES;

+------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------+
| snowflake_region | created_on                    | account_name    | name     | comment | is_primary | primary                                  | snowflake_region | replication_allowed_to_accounts                                  | failover_allowed_to_accounts    |
|------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------|
| AWS_US_WEST_2    | 2019-11-15 00:51:45.473 -0700 | MYACCOUNT1      | MYDB1    | NULL    | true       | PUBLIC.AWS_US_WEST_2.MYACCOUNT1.MYDB1    | AWS_US_WEST_2    | PUBLIC.AWS_US_EAST_1.MYACCOUNT2, PUBLIC.AWS_US_WEST_2.MYACCOUNT1 | PUBLIC.AWS_US_WEST_2.MYACCOUNT1 |
+------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------+

-- Create a replica of the 'mydb1' primary database
CREATE DATABASE mydb1
  AS REPLICA OF aws_us_west_2.myaccount1.mydb1
  AUTO_REFRESH_MATERIALIZED_VIEWS_ON_SECONDARY = TRUE;

-- Verify the secondary database
SHOW REPLICATION DATABASES;

+------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------+
| snowflake_region | created_on                    | account_name    | name     | comment | is_primary | primary                                  | snowflake_region | replication_allowed_to_accounts                                  | failover_allowed_to_accounts    |
|------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------|
| AWS_US_WEST_2    | 2019-11-15 00:51:45.473 -0700 | MYACCOUNT1      | MYDB1    | NULL    | true       | PUBLIC.AWS_US_WEST_2.MYACCOUNT1.MYDB1    | AWS_US_WEST_2    | PUBLIC.AWS_US_EAST_1.MYACCOUNT2, PUBLIC.AWS_US_WEST_2.MYACCOUNT1 | PUBLIC.AWS_US_WEST_2.MYACCOUNT1 |
| AWS_US_EAST_1    | 2019-08-15 15:51:49.094 -0700 | MYACCOUNT2      | MYDB1    | NULL    | false      | PUBLIC.AWS_US_WEST_2.MYACCOUNT1.MYDB1    | AWS_US_EAST_1    |                                                                  |                                 |
+------------------+-------------------------------+-----------------+----------+---------+------------+------------------------------------------+------------------+------------------------------------------------------------------+---------------------------------+

Refreshing Each Secondary Database

The instructions in this sections explain how refresh a secondary database from a snapshot of its primary database (using ALTER DATABASE … REFRESH). A snapshot includes changes to the objects and data.

Note that the owner of the secondary database (role with the OWNERSHIP privilege on the database) owns any new objects added as a result of a database refresh.

Note

To refresh a secondary database, the role used to perform the operation must have the OWNERSHIP privilege on the database.

To verify the current region after you log into an account, query the CURRENT_REGION function.

ALTER DATABASE mydb1 REFRESH​;

Refreshing a Secondary Database in the Web Interface

The Snowflake web interface provides a visual representation of the current status of a secondary database refresh. For instructions, see Monitoring the Progress of a Database Refresh in the Web Interface (in this topic).

Increasing the Statement Timeout for the Initial Replication

Database replication uses Snowflake-provided compute resources instead of your own virtual warehouse to copy objects and data. However, the STATEMENT_TIMEOUT_IN_SECONDS session/object parameter still controls how long a statement runs before it is canceled. The default value is 172800 (2 days). Because the initial replication of a very large primary database can take longer than 2 days to complete (depending on the amount of metadata in the database as well as the amount of data in database objects), we recommend increasing the STATEMENT_TIMEOUT_IN_SECONDS value to 604800 (7 days, the maximum value) for the session in which you run the replication operation.

Run the following ALTER SESSION statement prior to executing the ALTER DATABASE secondary_db_name REFRESH statement in the same session:

ALTER SESSION SET STATEMENT_TIMEOUT_IN_SECONDS = 604800;

Note that the STATEMENT_TIMEOUT_IN_SECONDS parameter also applies to the active warehouse in a session. The parameter honors the lower value set at the session or warehouse level. If you have an active warehouse in the current session, set STATEMENT_TIMEOUT_IN_SECONDS to 604800 for this warehouse (using ALTER WAREHOUSE), too.

For example:

-- determine the active warehouse in the current session (if any)
SELECT CURRENT_WAREHOUSE();

+---------------------+
| CURRENT_WAREHOUSE() |
|---------------------|
| MY_WH               |
+---------------------+

-- change the STATEMENT_TIMEOUT_IN_SECONDS value for the active warehouse

ALTER WAREHOUSE my_wh SET STATEMENT_TIMEOUT_IN_SECONDS = 604800;

You can reset the parameter value to the default after the replication operation is completed:

ALTER WAREHOUSE my_wh UNSET STATEMENT_TIMEOUT_IN_SECONDS;

Refreshing a Secondary Database on a Schedule

As a best practice, we recommend scheduling your secondary database refreshes. This section provides instructions for starting a database refresh automatically on a specified schedule.

The frequency with which you refresh a secondary database depends on the Recovery Point Objective (RPO) for the data in the secondary database. For example, if applications that rely on the data can tolerate up to 1 hour of data loss, then you must refresh the data at least every every hour. If the data loss tolerance is 5 minutes, then refresh the secondary database at least every 5 minutes.

Note

  • We recommend that you execute the initial replication of a primary database manually (using ALTER DATABASE … REFRESH), and only schedule subsequent refreshes.

  • There is a 60 minute default limit on a single run of a task. This limitation was implemented as a safeguard against non-terminating tasks. In rare circumstances, a refresh of a very large database could exceed the default task run limit. To determine if this occurred, query the TASK_HISTORY table function. If your task exceeded the window scheduled for the task, the cause is often an undersized warehouse. Review the warehouse size and consider increasing it to fit within the schedule window or the one-hour limit.

    Alternatively, consider increasing the timeout limit for the task by executing ALTER TASK … SET USER_TASK_TIMEOUT_MS = <num>.

Complete the steps in this section to start a database refresh automatically on a specified schedule.

Prerequisites

The following Snowflake objects are required in the account that stores the secondary database:

  • The secondary database.

  • A separate database to store the new objects created in this section. Because secondary databases are ready-only, this database must be separate from the secondary database. This database must also include the following objects:

    • Schema. Use the PUBLIC schema, or create a new schema using CREATE SCHEMA.

    • Warehouse to provide compute resources for the database refresh. Create a new warehouse using CREATE WAREHOUSE.

    • Task that refreshes the secondary database on a schedule.

Required privileges

The steps in this section require a role with the following privileges in the account in which the secondary database is refreshed:

Object Type

Object

Privilege

Notes

Account

Account that stores the secondary database

EXECUTE TASK

Required to run the new task.

Database

Secondary database

OWNERSHIP

Required to refresh the secondary database.

Database

Database that stores the new task.

USAGE

Schema

Schema that stores the new task

USAGE, CREATE TASK

Task

OWNERSHIP

The role that creates the task owns the object by default. Ownership can be transferred to a different role using GRANT privileges … TO ROLE.

Warehouse

Warehouse used to run the task

USAGE

Steps

Complete the following steps for each secondary database you want to refresh on a schedule:

  1. Create a task that starts the database refresh on a schedule (using CREATE TASK).

    Syntax
    CREATE [ OR REPLACE ] TASK [ IF NOT EXISTS ] <name>
      WAREHOUSE = <string>
      SCHEDULE = { <number> MINUTE | USING CRON <expr> <time_zone> } | AFTER <string>
    AS
      ALTER DATABASE <secondary_db_name> REFRESH;
    

    For example, create a task named refresh_mydb1_task that refreshes a secondary database named mydb1 every 10 minutes. The task runs using the existing warehouse mywh:

    CREATE TASK refresh_mydb1_task
      WAREHOUSE = mywh
      SCHEDULE = '10 minute'
    AS
      ALTER DATABASE mydb1 REFRESH;
    
  2. A task is suspended by default when it is created. Resume the task to allow it to run based on the parameters specified in the task definition:

ALTER TASK refresh_mydb1_task RESUME;

Monitoring the Progress of a Database Refresh

To determine the current status of the initial database replication or a subsequent secondary database refresh, query the DATABASE_REFRESH_PROGRESS , DATABASE_REFRESH_PROGRESS_BY_JOB table function (in the Information Schema).

A database refresh operation can require several hours or longer to complete depending on the amount of data to replicate.

To view the replication history for a specified database within a specified date range, query either of the following:

Example

Monitor the progress of the mydb1 secondary database refresh:

select *
  from table(information_schema.database_refresh_progress(mydb1));

Monitoring the Progress of a Database Refresh in the Web Interface

Manually start a secondary database refresh in the Snowflake web interface to view a dynamic progress bar showing the current status of the refresh operation with statistics.

To start a secondary database refresh operation:

  1. In the Snowflake web interface, click the Databases Databases tab tab » Replication.

  2. Select the secondary database to refresh.

  3. Click the Refresh now button. The Refresh Database dialog opens.

  4. Click the Refresh button.

The Last Refresh Status column displays the status of the current refresh operation. The progress bar is updated dynamically.

The Refresh History statistics in the side window also display the current refresh status, along with the refresh start time, number of bytes transferred, and other statistics.

Secondary refresh operation in the Snowflake web interface

Viewing the Database Refresh History

To view the history of secondary database refresh operations, query the DATABASE_REFRESH_HISTORY table function (in the Information Schema). This function returns database refresh activity within the last 14 days.

or

Query the REPLICATION_USAGE_HISTORY View view (in the Account Usage schema in the shared Snowflake database). This view returns replication usage activity within the last 365 days (1 year).

Example

View the history of the mydb1 secondary database refresh operation:

select *
  from table(information_schema.database_refresh_history(mydb1));

Comparing Data Sets in Primary and Secondary Databases

Optionally use the HASH_AGG function to compare the rows in a random set of tables in a primary and secondary database to verify data consistency. The HASH_AGG function returns an aggregate signed 64-bit hash value over the (unordered) set of input rows. Query this function on all or a random subset of tables in a secondary database and on the primary database (as of the timestamp for the primary database snapshot) and compare the output.

Example

  1. On the secondary database, query the DATABASE_REFRESH_HISTORY table function (in the Information Schema). Note the END_TIME column for the row indicating when the latest refresh operation completed. This is the timestamp for the latest snapshot of the primary database.

  2. Query the HASH_AGG function for a specified table. The following query returns a hash value for all rows in the mytable table:

    SELECT HASH_AGG( * ) FROM mytable;
    
  3. On the primary database, query the HASH_AGG function for the same table. Using Time Travel, specify the timestamp when the latest snapshot was taken for the secondary database:

    SELECT HASH_AGG( * ) FROM mytable AT(TIMESTAMP => <primary_snapshot_timestamp>);
    
  4. Compare the results from the two queries. The output should be identical.