Replicating Databases Across Multiple Accounts

This topic describes the steps necessary to replicate databases across multiple Snowflake accounts in different regions and keeping the database objects and stored data synchronized.

In this Topic:

Region Support for Database Replication and Failover/Failback

All Snowflake regions across Amazon Web Services, Google Cloud Platform, and Microsoft Azure support Database Replication and Failover/Failback.

Note that accounts can replicate databases between Region Groups (for example, between Virtual Private Snowflake (VPS) and multi-tenant regions) to facilitate data sharing and account migrations between these regions. This ability is disabled by default. You can contact Snowflake Support to enable access.

Web Interface for Database Replication and Failover/Failback

Account administrators (users with the ACCOUNTADMIN role) can manage replication and failover/failback actions in either the new or classic web interface.

New Web Interface

Navigation:

Data » Databases

Promoting a Local Database

  1. Log into a Snowflake account that contains a local database that you want to replicate to one or more other accounts.

  2. Click the dropdown menu in the upper left (next to your login name) » Switch Role » ACCOUNTADMIN.

  3. On the left side of the Databases page, click on a local database in the database object explorer. The database details page opens.

  4. Click the actions () button in the upper-right corner of the page » Enable Replication. The Enable replication dialog opens.

    This dialog enables you to perform the following actions:

    • Promote the local database to serve as a primary database.

    • Create a secondary database in one or more target accounts.

    • Refresh each secondary database once, after it is created.

  5. For each target account for this database, check the options to create a secondary database and refresh the database.

  6. Log into the target account as a user who was previously granted the ACCOUNTADMIN role in that account.

    Snowflake performs the requested actions and displays a success dialog.

    Manage replication for this database from the Replication tab in the database details.

Managing Secondary Databases

  1. Log into an account that contains a secondary database.

  2. Click the dropdown menu in the upper left (next to your login name) » Switch Role » ACCOUNTADMIN.

  3. On the left side of the Databases page, click on a secondary database in the database object explorer. The database details page opens.

  4. Click on the Replication tab.

    The following actions are available from the actions () button in the upper-right corner of the page:

    • Promote the secondary database to serve as the primary database. This feature requires Business Critical (or higher).

    • Refresh the secondary database.

    • Copy a template to create a task that refreshes the secondary database on a schedule. Paste the template into a Snowsight worksheet and edit it to specify the desired schedule.

Classic Web Interface

Use the Replication area of the Databases Databases tab tab in the Snowflake web interface to perform most actions related to configuring and managing database replication, including the following actions:

  • Promote a local database to serve as a primary database.

  • Enable failover for a primary database (Business Critical Edition accounts or higher).

  • Refresh a secondary database, either once (manually) or repeatedly (on a schedule, using a task).

  • Promote a secondary database to serve as a primary database (Business Critical Edition accounts (or higher)).

  • Disable replication and/or failover for a primary database.

Replicating a Database to Another Account

The instructions in this section explain how to prepare your accounts for replication, promote a local database to serve as a primary database, perform the initial replication of this primary database to another account, and schedule refreshing of secondary databases.

Important

Target accounts do not have Tri-Secret Secure or AWS PrivateLink configured as a default. If Tri-Secret Secure or PrivateLink is required for compliance, security or other purposes, it is your responsibility to ensure that those features are configured in the target account.

Account Identifier for Replication and Fail-over SQL Commands

The example SQL statements in the instructions below use an account identifier in the format, organization_name.account_name. However, account identifiers in the format snowflake_region.account_locator are supported.

For more details, see Account Identifier for Replication and Fail-over.

Prerequisite: Enable Replication for Your Accounts

You must have replication enabled for two or more accounts in your organization. If the Organizations feature is enabled for your account, a user with the ORGADMIN role can enable replication. See Enable Replication for Accounts in Your Organization in this topic for details. If you do not have Organizations enabled, you can contact Snowflake Support to enable replication for your accounts.

Enable Replication for Accounts in Your Organization

A user with the ORGADMIN role can enable an account for replication by setting the ENABLE_ACCOUNT_DATABASE_REPLICATION parameter to true from the ORGADMIN account.

Note

If you have more than one account with the same account locator in different regions, to enable replication, contact Snowflake Support.

-- Assume the ORGADMIN role
use role orgadmin;

-- View the list of the accounts in your organization
-- Note the account_locator for each account for which you are enabling replication
show organization accounts;

-- Enable replication by executing this statement for each source and target account in your organization
select system$global_account_set_parameter('<account_locator>',
'ENABLE_ACCOUNT_DATABASE_REPLICATION', 'true');

Enabling Database Replication and Failover, and Refreshing Secondary Databases

Note

Except where noted, only account administrators (users with the ACCOUNTADMIN role) can execute the SQL statements in this section.

Step 1: Viewing All Accounts in Your Organization

Retrieve the list of accounts in your organization. Any existing permanent or transient database in these accounts can be modified to serve as a primary database. Replicas of a primary database (i.e. secondary databases) can only be created in these accounts.

To view the list of accounts in your organization, query SHOW REPLICATION ACCOUNTS.

SHOW REPLICATION ACCOUNTS;

+------------------+---------------------------------+---------------+------------------+---------+-------------------+
| snowflake_region | created_on                      | account_name  | account_locator  | comment | organization_name |
|------------------+---------------------------------+---------------+------------------+---------+-------------------|
| AWS_US_WEST_2    | 2018-11-19 16:11:12.720 -0700   | ACCOUNT1      | MYACCOUNT1       |         | MYORG             |
| AWS_US_EAST_1    | 2019-06-02 14:12:23.192 -0700   | ACCOUNT2      | MYACCOUNT2       |         | MYORG             |
+------------------+---------------------------------+---------------+------------------+---------+-------------------+

See the complete list of Snowflake Region IDs.

Step 2: Promoting a Local Database to Serve as a Primary Database

Modify an existing permanent or transient database to serve as a primary database using an ALTER DATABASE … ENABLE REPLICATION TO ACCOUNTS statement. Provide a comma-separated list of accounts in your organization that can store a replica of this database (i.e. a secondary database), allowing users in those accounts to query objects in the secondary database.

Example

Promote local database mydb1 (in account account1) to serve as a primary database and specify that accounts account2 and account3 can each store a replica of this database:

ALTER DATABASE mydb1 ENABLE REPLICATION TO ACCOUNTS myorg.account2, myorg.account3;

Step 3: Enabling Failover for a Primary Database

Note

Failover/Failback requires Business Critical (or higher). To inquire about upgrading, please contact Snowflake Support.

Enable failover for a primary database to one or more accounts in your organization using an ALTER DATABASE … ENABLE FAILOVER TO ACCOUNTS statement. The replica of this primary database in any one of these accounts (i.e. a secondary database) can be promoted to serve as the primary database.

Note that enabling failover for a primary database can be done either before or after a replica of the primary database has been created in a specified account.

Example

Enable failover for primary database mydb1 to accounts account2 and account3.

-- Executed from primary account
ALTER DATABASE mydb1 ENABLE FAILOVER TO ACCOUNTS myorg.account2, myorg.account3;

Step 4: Creating a Secondary Database

Create a replica of an existing primary database in the same account that stores the primary database, or a different account (in the same or a different region). Note that you can only create a secondary database in an account specified in the ALTER DATABASE … ENABLE REPLICATION TO ACCOUNTS statement in Step 2: Promoting a Local Database to Serve as a Primary Database.

Execute a CREATE DATABASE … AS REPLICA OF statement in each target account to create a replica of the specified primary database.

Important

As a best practice, we recommend giving each secondary database the same name as its primary database. This practice supports referencing fully-qualified objects (i.e. '<db>.<schema>.<object>') by other objects in the same database, such as querying a fully-qualified table name in a view.

If a secondary database has a different name from the primary database, then these object references would break in the secondary database.

To view the list of primary and secondary databases in your organization, query SHOW REPLICATION DATABASES. After a secondary database is created, an account administrator can transfer ownership of the database to another role (using GRANT OWNERSHIP.)

Example

The following example creates a replica of the myorg.account1.mydb1 primary database in the myorg.account2 account:

-- Log into the ACCOUNT2 account.

-- Query the set of primary and secondary databases in your organization.
-- In this example, the MYORG.ACCOUNT1 primary database is available to replicate.
SHOW REPLICATION DATABASES;

+------------------+-------------------------------+-----------------+----------+---------+------------+----------------------------+---------------------------------+------------------------------+-------------------+-----------------+
| snowflake_region | created_on                    | account_name    | name     | comment | is_primary | primary                    | replication_allowed_to_accounts | failover_allowed_to_accounts | organization_name | account_locator |
|------------------+-------------------------------+-----------------+----------+---------+------------+----------------------------+---------------------------------+------------------------------+-------------------+-----------------|
| AWS_US_WEST_2    | 2019-11-15 00:51:45.473 -0700 | ACCOUNT1        | MYDB1    | NULL    | true       | MYORG.ACCOUNT1.MYDB1       | MYORG.ACCOUNT2, MYORG,ACCOUNT1  | MYORG.ACCOUNT1               | MYORG             | MYACCOUNT1      |
+------------------+-------------------------------+-----------------+----------+---------+------------+----------------------------+---------------------------------+------------------------------+-------------------+-----------------+

-- Create a replica of the 'mydb1' primary database
CREATE DATABASE mydb1
  AS REPLICA OF myorg.account1.mydb1;

-- Verify the secondary database
SHOW REPLICATION DATABASES;

+------------------+-------------------------------+---------------+----------+---------+------------+-------------------------+---------------------------------+------------------------------+-------------------+-----------------+
| snowflake_region | created_on                    | account_name  | name     | comment | is_primary | primary                 | replication_allowed_to_accounts | failover_allowed_to_accounts | organization_name | account_locator |
|------------------+-------------------------------+---------------+----------+---------+------------+------------------------------------------+----------------+------------------------------+-------------------------------------|
| AWS_US_WEST_2    | 2019-11-15 00:51:45.473 -0700 | ACCOUNT1      | MYDB1    | NULL    | true       | MYORG.ACCOUNT1.MYDB1    | MYORG.ACCOUNT2, MYORG.ACCOUNT1  | MYORG.ACCOUNT1               | MYORG             | MYACCOUNT1      |
| AWS_US_EAST_1    | 2019-08-15 15:51:49.094 -0700 | ACCOUNT2      | MYDB1    | NULL    | false      | MYORG.ACCOUNT1.MYDB1    |                                 |                              | MYORG             | MYACCOUNT2      |
+------------------+-------------------------------+---------------+----------+---------+------------+-------------------------+---------------------------------+------------------------------+-------------------+-----------------+

Step 5. Refreshing Each Secondary Database

The instructions in this section explain how to refresh a secondary database from a snapshot of its primary database (using ALTER DATABASE … REFRESH). A snapshot includes changes to the objects and data. For the initial replication of a very large primary database, we recommend increasing the statement timeout.

Note that the owner of the secondary database (role with the OWNERSHIP privilege on the database) owns any new objects added as a result of a database refresh.

Note

To refresh a secondary database, the role used to perform the operation must have the OWNERSHIP privilege on the database.

To verify the current region after you log into an account, query the CURRENT_REGION function.

ALTER DATABASE mydb1 REFRESH;

You can also refresh a secondary database in the web ui.

Step 6. Refreshing a Secondary Database on a Schedule

As a best practice, we recommend scheduling your secondary database refreshes. This section provides instructions for starting a database refresh automatically on a specified schedule.

The frequency with which you refresh a secondary database depends on the Recovery Point Objective (RPO) for the data in the secondary database. For example, if applications that rely on the data can tolerate up to 1 hour of data loss, then you must refresh the data at least every hour. If the data loss tolerance is 5 minutes, then refresh the secondary database at least every 5 minutes.

Note

  • We recommend that you execute the initial replication of a primary database manually (using ALTER DATABASE … REFRESH), and only schedule subsequent refreshes.

  • There is a 60 minute default limit on a single run of a task. This limitation was implemented as a safeguard against non-terminating tasks. In rare circumstances, a refresh of a very large database could exceed the default task run limit. To determine if this occurred, query the TASK_HISTORY table function. Consider increasing the timeout limit for the task by executing ALTER TASK … SET USER_TASK_TIMEOUT_MS = <num>.

Complete the steps in this section to start a database refresh automatically on a specified schedule.

Prerequisites

The following Snowflake objects are required in the account that stores the secondary database:

  • The secondary database.

  • A separate database to store the new objects created in this section. Because secondary databases are ready-only, this database must be separate from the secondary database. This database must also include the following objects:

    • Schema. Use the PUBLIC schema, or create a new schema using CREATE SCHEMA.

    • Warehouse. Any warehouse can be provided here to meet the syntax requirement but is not used for the database refresh. Create a new warehouse using CREATE WAREHOUSE.

    • Task that refreshes the secondary database on a schedule.

Required privileges

The steps in this section require a role with the following privileges in the account in which the secondary database is refreshed:

Object Type

Object

Privilege

Notes

Account

Account that stores the secondary database

EXECUTE TASK

Required to run the new task.

Database

Secondary database

OWNERSHIP

Required to refresh the secondary database.

Database

Database that stores the new task

USAGE

Schema

Schema that stores the new task

USAGE, CREATE TASK

Task

OWNERSHIP

The role that creates the task owns the object by default. Ownership can be transferred to a different role using GRANT privileges … TO ROLE.

Warehouse

Warehouse used to configure the task

USAGE

Specifying a warehouse is required, but the warehouse is not used to run the task.

Steps

Complete the following steps for each secondary database you want to refresh on a schedule:

  1. Create a task that starts the database refresh on a schedule (using CREATE TASK). Note that although the CREATE TASK syntax for specifying a replication schedule requires a warehouse, the warehouse is not used for replication.

    For example, create a task named refresh_mydb1_task that refreshes a secondary database named mydb1 every 10 minutes with a 4 hour timeout. The task is configured using the existing warehouse mywh:

    CREATE TASK refresh_mydb1_task
      WAREHOUSE = mywh
      SCHEDULE = '10 minute'
      USER_TASK_TIMEOUT_MS = 14400000
    AS
      ALTER DATABASE mydb1 REFRESH;
    
  2. A task is suspended by default when it is created. Resume the task to allow it to run based on the parameters specified in the task definition:

ALTER TASK refresh_mydb1_task RESUME;

Example

Execute the following SQL statements in your preferred Snowflake client to enable replication and failover, do an initial database refresh and set up scheduled refreshes.

Execute from Source Account
-- The commands below are executed from the source account

-- View replication enabled accounts
SHOW REPLICATION ACCOUNTS;

ALTER DATABASE mydb ENABLE REPLICATION TO ACCOUNTS myorg.account2, myorg.account3;
ALTER DATABASE mydb ENABLE FAILOVER TO ACCOUNTS myorg.account2, myorg.account3;
Execute from Each Target Account
-- The commands below are executed from each target account

-- View replication enabled databases
-- Note the primary column of the source database for the CREATE DATABASE statement below
SHOW REPLICATION DATABASES;

CREATE DATABASE mydb
  AS REPLICA OF myorg.account1.mydb;

-- Increase statement timeout for initial refresh
-- Optional but recommended for initial refresh of a large database
ALTER SESSION SET STATEMENT_TIMEOUT_IN_SECONDS = 604800;
-- If you have an active warehouse in current session, update warehouse statement timeout
SELECT CURRENT_WAREHOUSE();
ALTER WAREHOUSE my_wh SET STATEMENT_TIMEOUT_IN_SECONDS = 604800;
-- Reset warehouse statement timeout after initial refresh
ALTER WAREHOUSE my_wh UNSET STATEMENT_TIMEOUT_IN_SECONDS;

-- Refresh a secondary database
ALTER DATABASE mydb REFRESH;

-- Create task
-- Set up refresh schedule for each secondary database using a separate database
USE DATABASE my_db2;

-- Create a task and RESUME the task for each secondary database
-- Edit the task schedule and timeout for your specific use case
CREATE TASK my_refresh_task
  WAREHOUSE = my_wh
  SCHEDULE = '10 minute'
  USER_TASK_TIMEOUT_MS = 14400000
AS
  ALTER DATABASE mydb REFRESH;

-- Start task
ALTER TASK my_refresh_task RESUME;

Increasing the Statement Timeout for the Initial Replication

Database replication uses Snowflake-provided compute resources instead of your own virtual warehouse to copy objects and data. However, the STATEMENT_TIMEOUT_IN_SECONDS session/object parameter still controls how long a statement runs before it is canceled. The default value is 172800 (2 days). Because the initial replication of a very large primary database can take longer than 2 days to complete (depending on the amount of metadata in the database as well as the amount of data in database objects), we recommend increasing the STATEMENT_TIMEOUT_IN_SECONDS value to 604800 (7 days, the maximum value) for the session in which you run the replication operation.

Run the following ALTER SESSION statement prior to executing the ALTER DATABASE secondary_db_name REFRESH statement in the same session:

ALTER SESSION SET STATEMENT_TIMEOUT_IN_SECONDS = 604800;

Note that the STATEMENT_TIMEOUT_IN_SECONDS parameter also applies to the active warehouse in a session. The parameter honors the lower value set at the session or warehouse level. If you have an active warehouse in the current session, set STATEMENT_TIMEOUT_IN_SECONDS to 604800 for this warehouse (using ALTER WAREHOUSE), too.

For example:

-- determine the active warehouse in the current session (if any)
SELECT CURRENT_WAREHOUSE();

+---------------------+
| CURRENT_WAREHOUSE() |
|---------------------|
| MY_WH               |
+---------------------+

-- change the STATEMENT_TIMEOUT_IN_SECONDS value for the active warehouse

ALTER WAREHOUSE my_wh SET STATEMENT_TIMEOUT_IN_SECONDS = 604800;

You can reset the parameter value to the default after the replication operation is completed:

ALTER WAREHOUSE my_wh UNSET STATEMENT_TIMEOUT_IN_SECONDS;

Monitoring the Progress of a Database Refresh

To determine the current status of the initial database replication or a subsequent secondary database refresh, query the DATABASE_REFRESH_PROGRESS , DATABASE_REFRESH_PROGRESS_BY_JOB table function (in the Information Schema).

A database refresh operation can require several hours or longer to complete depending on the amount of data to replicate.

To view the replication history for a specified database within a specified date range, query either of the following:

Example

Monitor the progress of the mydb1 secondary database refresh:

select *
  from table(information_schema.database_refresh_progress(mydb1));

Monitoring the Progress of a Database Refresh in the Web Interface

Manually start a secondary database refresh in the Snowflake web interface to view a dynamic progress bar showing the current status of the refresh operation with statistics.

To start a secondary database refresh operation:

  1. In the Snowflake web interface, click the Databases Databases tab tab » Replication.

  2. Select the secondary database to refresh.

  3. Click the Refresh now button. The Refresh Database dialog opens.

  4. Click the Refresh button.

The Last Refresh Status column displays the status of the current refresh operation. The progress bar is updated dynamically.

The Refresh History statistics in the side window also display the current refresh status, along with the refresh start time, number of bytes transferred, and other statistics.

Secondary refresh operation in the Snowflake web interface

Viewing the Database Refresh History

To view the history of secondary database refresh operations, query the DATABASE_REFRESH_HISTORY table function (in the Information Schema). This function returns database refresh activity within the last 14 days.

or

Query the REPLICATION_USAGE_HISTORY View view (in the Account Usage schema in the shared Snowflake database). This view returns replication usage activity within the last 365 days (1 year).

Example

View the history of the mydb1 secondary database refresh operation:

select *
  from table(information_schema.database_refresh_history(mydb1));

Comparing Data Sets in Primary and Secondary Databases

Optionally use the HASH_AGG function to compare the rows in a random set of tables in a primary and secondary database to verify data consistency. The HASH_AGG function returns an aggregate signed 64-bit hash value over the (unordered) set of input rows. Query this function on all or a random subset of tables in a secondary database and on the primary database (as of the timestamp for the primary database snapshot) and compare the output.

Example

  1. On the secondary database, query the DATABASE_REFRESH_HISTORY table function (in the Information Schema). Note the END_TIME column for the row indicating when the latest refresh operation completed. This is the timestamp for the latest snapshot of the primary database.

  2. Query the HASH_AGG function for a specified table. The following query returns a hash value for all rows in the mytable table:

    SELECT HASH_AGG( * ) FROM mytable;
    
  3. On the primary database, query the HASH_AGG function for the same table. Using Time Travel, specify the timestamp when the latest snapshot was taken for the secondary database:

    SELECT HASH_AGG( * ) FROM mytable AT(TIMESTAMP => <primary_snapshot_timestamp>::TIMESTAMP);
    
  4. Compare the results from the two queries. The output should be identical.