Replicating Account Objects

This topic describes the steps necessary to replicate account objects and data across Snowflake accounts in the same organization, and keep the objects and data synchronized. Account replication can occur across Snowflake accounts in different regions and across cloud platforms.

Region Support for Replication and Failover/Failback

Customers can replicate across all regions within a Region Group. To replicate between regions in different Region Groups (for example, from a Snowflake commercial region to a Snowflake government region), please contact Snowflake Support to enable access.

Transitioning From Database Replication to Group-Based Replication

Databases that have been enabled for replication using ALTER DATABASE must have replication disabled before they can be added to a replication or failover group.

Note

Execute the SQL statements in this section using the ACCOUNTADMIN role.

Step 1. Disable Replication for a Replication Enabled Database

Execute the SYSTEM$DISABLE_DATABASE_REPLICATION function to disable replication for a primary database, along with any secondary databases linked to it, in order to add it to a replication or failover group.

Execute the following SQL statement from the source account with the primary database:

SELECT SYSTEM$DISABLE_DATABASE_REPLICATION('mydb');

Step 2. Add the Database to a Primary Failover Group and Create a Secondary Failover Group

Once you have successfully disabled replication for a database, you can add the primary database to a failover group in the source account.

Then create a secondary failover group in the target account. When the secondary failover group is refreshed in the target account, the previously secondary database will automatically be added as a member of the secondary failover group and refreshed with the changes from the primary database.

For more details on creating primary and secondary failover groups, see Workflow.

Note

When you add a previously replicated database to a replication or failover group, Snowflake does not re-replicate the data that has already been replicated for that database. Only changes since the last refresh are replicated when the group is refreshed.

Workflow

The following SQL statements demonstrate the workflow for enabling account and database object replication and refreshing objects. Each step is discussed in detail below.

Note

The following examples require replication be enabled for the source and target accounts. For details, see Prerequisite: Enable Replication for Accounts in the Organization.

Examples

Execute the following SQL statements in your preferred Snowflake client to enable account and database object replication and failover, and refresh objects.

Executed on Source Account

  1. Create a role and grant it the CREATE FAILOVER GROUP privilege. This step is optional:

    -- Execute the following SQL statements using the ACCOUNTADMIN role:
    USE ROLE ACCOUNTADMIN;
    
    CREATE ROLE myrole;
    
    GRANT CREATE FAILOVER GROUP ON ACCOUNT
     TO ROLE myrole;
    
  2. Create a failover group in the source account and enable replication to specific target accounts.

    Note

    USE ROLE myrole;
    
    -- Execute the following SQL statement using a role with the CREATE FAILOVER GROUP privilege:
    CREATE FAILOVER GROUP myfg
        OBJECT_TYPES = USERS, ROLES, WAREHOUSES, RESOURCE MONITORS, DATABASES
        ALLOWED_DATABASES = db1, db2
        ALLOWED_ACCOUNTS = myorg.myaccount2, myorg.myaccount3
        REPLICATION_SCHEDULE = '10 MINUTE';
    

    Attention

    If account objects (e.g. users or roles) that you do not want to drop during replication exist in the target account, do not include the REPLICATION_SCHEDULE parameter when you create a replication or failover group. For details, see Step 5. Apply Global IDs to Objects Created by Scripts in Target Accounts — Optional.

Executed on Target Account

  1. Create a role in the target account and grant it the CREATE FAILOVER GROUP privilege. This step is optional:

    -- Execute the following SQL statements using the ACCOUNTADMIN role:
    USE ROLE ACCOUNTADMIN;
    
    CREATE ROLE myrole;
    
    GRANT CREATE FAILOVER GROUP ON ACCOUNT
        TO ROLE myrole;
    
  2. Create a failover group in the target account as a replica of the failover group in the source account.

    USE ROLE myrole;
    
    -- Execute the following SQL statement using a role with the CREATE FAILOVER GROUP privilege:
    CREATE FAILOVER GROUP myfg
        AS REPLICA OF myorg.myaccount1.myfg;
    
  3. Apply global IDs to objects. This is an optional step for any account objects not created via replication.

    -- Execute the following SQL statements using the ACCOUNTADMIN role:
    SELECT SYSTEM$LINK_ACCOUNT_OBJECTS_BY_NAME('myfg');
    

    Attention

    This step must be executed if account objects (e.g. users or roles) that you do not want to drop during replication exist in the target account. Skipping this step may result in the loss of user worksheets or privilege grants to roles on objects (e.g. shares). For more details, see Step 5. Apply Global IDs to Objects Created by Scripts in Target Accounts — Optional.

Refresh a Secondary Group Manually

To manually refresh a secondary failover group in the target account, see Refreshing a Secondary Failover Group in a Target Account Manually (in this topic).

Note

As a best practice, Snowflake recommends scheduling automatic refreshes using the REPLICATION_SCHEDULE parameter. See Replication Schedule (in this topic).

Replicating Account Objects and Databases

The instructions in this section explain how to prepare your accounts for replication, enable the replication of specific objects from the source account to the target account, and synchronize the objects in the target account.

Important

Target accounts do not have Tri-Secret Secure or private connectivity to the Snowflake service (e.g. AWS PrivateLink) enabled by default. If you require Tri-Secret Secure or private connectivity to the Snowflake service for compliance, security or other purposes, it is your responsibility to configure and enable those features in the target account.

Prerequisite: Enable Replication for Accounts in the Organization

The organization administrator (ORGADMIN role) must enable replication for the source and target accounts.

To enable replication for accounts, a user with the ORGADMIN role uses the SYSTEM$GLOBAL_ACCOUNT_SET_PARAMETER function to set the ENABLE_ACCOUNT_DATABASE_REPLICATION parameter to true. Note that multiple accounts in an organization can be enabled for replication from the same ORGADMIN account.

Log into an ORGADMIN account to enable replication for each source and target account in your organization.

-- Assume the ORGADMIN role
use role orgadmin;

-- View the list of the accounts in your organization
-- Note the organization name and account name for each account for which you are enabling replication
show organization accounts;

-- Enable replication by executing this statement for each source and target account in your organization
select system$global_account_set_parameter('<organization_name>.<account_name>', 'ENABLE_ACCOUNT_DATABASE_REPLICATION', 'true');

Though the SYSTEM$GLOBAL_ACCOUNT_SET_PARAMETER function supports the legacy account locator identifier, it causes unexpected results when an organization has multiple accounts that share the same locator (in different regions).

Enable Stream and Task Replication Preview — Optional

To enable the stream replication and task replication preview for an account, database, or replication or failover group, set the ENABLE_STREAM_TASK_REPLICATION parameter to true for the source account or primary object.

Examples

The following examples must be executed in the source account.

Enable stream and task replication for a database mydb:

alter database mydb set ENABLE_STREAM_TASK_REPLICATION = true;

Enable stream and task replication for a replication group myrg:

alter replication group myrg set ENABLE_STREAM_TASK_REPLICATION = true;

Enable stream and task replication for account myaccount in organization myorg:

alter account myorg.myaccount set ENABLE_STREAM_TASK_REPLICATION = true;

Note

  • The parameter cannot be set for a database if the database is included in a replication or failover group.

  • If the parameter is explicitly set on a database object that is not included in a replication or failover group, you must unset the parameter before adding it to a replication or failover group otherwise, the replication operation will fail.

  • Parameters set on the database, replication group, or failover group override the parameters set on the account. For details, refer to Parameter Hierarchy and Types and Object Parameters.

Step 1: Create a Role with the CREATE FAILOVER GROUP Privilege in the Source Account — Optional

Create a role and grant it the CREATE FAILOVER GROUP privilege. This step is optional. If you have already created this role, skip to Step 2: Create a Primary Failover Group in a Source Account.

-- Execute the following SQL statements using the ACCOUNTADMIN role:
USE ROLE ACCOUNTADMIN;

CREATE ROLE myrole;

GRANT CREATE FAILOVER GROUP ON ACCOUNT
    TO ROLE myrole;

Step 2: Create a Primary Failover Group in a Source Account

Create a primary failover group and enable the replication and failover of specific objects from the current (source) account to one or more target accounts in the same organization.

View All Accounts Enabled for Replication

To retrieve the list of accounts in your organization that are enabled for replication, use SHOW REPLICATION ACCOUNTS.

-- Execute the following SQL statements using the ACCOUNTADMIN role:
SHOW REPLICATION ACCOUNTS;

+------------------------+-------------------------+--------------+-----------------+-----------------+-------------------+
| snowflake_region       | created_on              | account_name | account_locator |  comment        | organization_name |
+------------------------+-------------------------+--------------+-----------------+-----------------+-------------------+
| AWS_US_WEST_2          | 2020-07-15 21:59:25.455 | myaccount1   | myacctlocator1  |                 | myorg             |
+------------------------+-------------------------+--------------+-----------------+-----------------+-------------------+
| AWS_US_EAST_1          | 2020-07-23 14:12:23.573 | myaccount2   | myacctlocator2  |                 | myorg             |
+------------------------+-------------------------+--------------+-----------------+-----------------+-------------------+
| AWS_US_EAST_2          | 2020-07-25 19:25:04.412 | myaccount3   | myacctlocator3  |                 | myorg             |
+------------------------+-------------------------+--------------+-----------------+-----------------+-------------------+

See the complete list of Region IDs.

View Failover and Replication Group Membership

Account, database, and share objects have constraints on group membership. Before creating new groups or adding objects to existing groups, you can review the list of existing failover groups and the objects in each group.

Note

Only an account administrator (user with the ACCOUNTADMIN role) or the group owner (role with the OWNERSHIP privilege on the group) can execute the SQL statements in this section.

View all failover groups linked to the current account, and the object types in each group:

SHOW FAILOVER GROUPS;

View all the databases in failover group myfg:

SHOW DATABASES IN FAILOVER GROUP myfg;

View all the shares in failover group myfg:

SHOW SHARES IN FAILOVER GROUP myfg;

Enable Account and Database Object Replication from a Source Account to Target Account

Create a failover group of specified account and database objects in the source account and enable replication and failover to a list of target accounts. See CREATE FAILOVER GROUP for syntax.

For example, enable replication of users, roles, warehouses, resources monitors, and databases db1 and db2 from the source account to the myaccount2 account in the same organization. Set the replication schedule to automatically refresh myaccount2 every 10 minutes.

Note

If you have databases to add to a replication or failover group that have been previously enabled for database replication using ALTER DATABASE, follow the Transitioning From Database Replication to Group-Based Replication instructions (in this topic) before adding them to a group.

Executed on the source account:

use role myrole;

-- Execute the following SQL statement using a role with the CREATE FAILOVER GROUP privilege:
create failover group myfg
    object_types = users, roles, warehouses, resource monitors, databases, integrations, network policies
    allowed_databases = db1, db2
    allowed_integration_types = API INTEGRATIONS
    allowed_accounts = myorg.myaccount2
    replication_schedule = '10 MINUTE';

Attention

If account objects (e.g. users or roles) that you do not want to drop during replication exist in the target account, do not include the REPLICATION_SCHEDULE parameter when you create a replication or failover group. For details, see Step 5. Apply Global IDs to Objects Created by Scripts in Target Accounts — Optional.

Step 3: Create a Role with the CREATE FAILOVER GROUP Privilege in the Target Account — Optional

Create a role in the target account and grant it the CREATE FAILOVER GROUP privilege. This step is optional. If you have already created this role, skip to Step 4: Create a Secondary Failover Group in the Target Account.

-- Execute the following SQL statements using the ACCOUNTADMIN role:
USE ROLE ACCOUNTADMIN;

CREATE ROLE myrole;

GRANT CREATE FAILOVER GROUP ON ACCOUNT
    TO ROLE myrole;

Step 4: Create a Secondary Failover Group in the Target Account

Create a secondary failover group in the target account as a replica of the primary failover group in the source account.

Execute a CREATE FAILOVER GROUP … AS REPLICA OF statement in each target account for which you enabled replication in Step 2: Create a Primary Failover Group in a Source Account (in this topic).

Executed from each target account:

use role myrole;

-- Execute the following SQL statement using a role with the CREATE FAILOVER GROUP privilege:
CREATE FAILOVER GROUP myfg
  AS REPLICA OF myorg.myaccount1.myfg;

Step 5. Apply Global IDs to Objects Created by Scripts in Target Accounts — Optional

Attention

This step must be executed if account objects (e.g. users or roles) that you do not want to drop during replication exist in the target account. Skipping this step may result in the loss of user worksheets or privilege grants to roles on objects (e.g. shares).

If you created account objects, for example, users and roles, in your target account by any means other than via replication (e.g. using scripts), these users and roles have no global identifier by default. The refresh operation uses global identifiers to synchronize these objects to the same objects in the source account. When a target account is refreshed from the source account, the refresh operation drops any account objects of the types in the object_types list (e.g. users or roles) in the target account that have no global identifier.

Create Replication or Failover Groups Without a Replication Schedule

When a secondary replication or failover group is created as a replica of a primary replication or failover group with a replication schedule, an initial refresh of the secondary group is automatically executed when the secondary replication or failover group is created. In order to prevent this from happening before you can execute the following instructions, create the primary replication or failover group without setting the REPLICATION_SCHEDULE parameter.

After the secondary group is created in the target account, use the SYSTEM$LINK_ACCOUNT_OBJECTS_BY_NAME function to add global identifiers to account objects in the target account. For account objects that exist only in the target account, replicate them manually in the source account before calling this function.

After you apply global identifiers to objects in the target account, or recreate them manually in the source account, use the ALTER REPLICATION GROUP or ALTER FAILOVER GROUP command to set the replication schedule for the primary replication or failover group.

Update the Remote Service for API Integrations

If you have enabled API integration replication, additional steps are required after the API integration is replicated to the target account. The replicated integration has its own identity and access management (IAM) entity that are different from the identity and IAM entity of the primary integration. Therefore, you must update the permissions on the remote service to grant access to replicated functions. The process is similar to granting access to the functions on the primary account. See the below links for more details:

Refreshing a Secondary Failover Group in a Target Account Manually

To manually refresh the objects in a target account, execute the ALTER FAILOVER GROUP … REFRESH command.

As a best practice, we recommend scheduling your secondary refreshes by setting the REPLICATION_SCHEDULE parameter using CREATE FAILOVER GROUP or ALTER FAILOVER GROUP.

Note

If the user who calls the function in the target account was dropped in the source account, the refresh operation fails.

Grant the REPLICATE Privilege on Failover Group to Role — Optional

The REPLICATE privilege is currently not replicated and must be granted on a failover (or replication) group in both the source and target accounts.

Executed from the source account:

-- Execute the following SQL statements using a role with the OWNERSHIP privilege on the group:
GRANT REPLICATE ON FAILOVER GROUP myfg TO ROLE my_replication_role;

Executed from the target account:

-- Execute the following SQL statements using a role with the OWNERSHIP privilege on the group:
GRANT REPLICATE ON FAILOVER GROUP myfg TO ROLE my_replication_role;

Manually Refresh a Secondary Failover Group

For example, to refresh the objects in the failover group myfg, execute the following statement from the target account:

USE ROLE my_replication_role;

-- Execute the following SQL statements using a role with the REPLICATE privilege:
ALTER FAILOVER GROUP myfg REFRESH;

Monitoring Replication

This section provides information on how to monitor account replication progress, history, and costs.

Monitor the Progress of a Replication Group or Failover Group Refresh

To monitor the progress of a replication or failover group refresh, query the REPLICATION_GROUP_REFRESH_PROGRESS, REPLICATION_GROUP_REFRESH_PROGRESS_BY_JOB table function (in the Snowflake Information Schema).

Example

View the progress of the most recent refresh operation for the failover group myfg:

SELECT PHASE_NAME, START_TIME, END_TIME, PROGRESS, DETAILS
  FROM TABLE(information_schema.replication_group_refresh_progress('myfg'));

Replication History

To view the replication history of a specific replication or failover group within a specified date range, query one of the following:

Examples

Query the Information Schema REPLICATION_GROUP_REFRESH_HISTORY table function to view the account replication history of failover group myfg in the last 7 days:

SELECT PHASE_NAME, START_TIME, END_TIME, TOTAL_BYTES, OBJECT_COUNT
  FROM TABLE(information_schema.replication_group_refresh_history('myfg'))
  WHERE START_TIME >= current_date - interval '7 days';

Query the Account Usage REPLICATION_GROUP_REFRESH_HISTORY view to view the account replication history in the current month:

SELECT REPLICATION_GROUP_NAME, PHASE_NAME, START_TIME, END_TIME, TOTAL_BYTES, OBJECT_COUNT
  FROM snowflake.account_usage.replication_group_refresh_history
  WHERE START_TIME >= date_trunc('month', current_date());

Monitor Replication Costs

To monitor credit usage for replication, query one of the following:

Examples

Query the REPLICATION_GROUP_USAGE_HISTORY table function to view credits used for account replication in the last 7 days:

SELECT start_time, end_time, replication_group_name, credits_used, bytes_transferred
  FROM table(information_schema.replication_group_usage_history(date_range_start=>dateadd('day', -7, current_date())));

Query the Account Usage REPLICATION_GROUP_USAGE_HISTORY view to view the credits used by replication or failover group for account replication history in the current month:

SELECT start_time, 
  end_time, 
  replication_group_name, 
  credits_used, 
  bytes_transferred
FROM snowflake.account_usage.replication_group_usage_history
WHERE start_time >= DATE_TRUNC('month', CURRENT_DATE());

Monitor Replication Costs for Databases

The cost for replication for an individual database included in a replication or failover group can be calculated by retrieving the number of copied bytes for the database and associating it with the credits used.

Examples

Querying Account Usage Views

The following examples calculate the costs for database replication in one replication group for the past 30 days.

  1. Query the REPLICATION_GROUP_REFRESH_HISTORY Account Usage view and calculate the sum of the number of bytes replicated per database.

    For example, to calculate the sum of the number of bytes replicated for databases in the replication group myrg in the last 30 days:

    select sum(value:totalBytesToReplicate) as sum_database_bytes
    from snowflake.account_usage.replication_group_refresh_history rh,
        lateral flatten(input => rh.total_bytes:databases)
    where rh.replication_group_name = 'MYRG'
    and rh.start_time >= current_date - interval '30 days';
    

    Note the output of the sum of database bytes:

    +--------------------+
    | SUM_DATABASE_BYTES |
    |--------------------|
    |              22016 |
    +--------------------+
    
  2. Query the REPLICATION_GROUP_USAGE_HISTORY Account Usage view and calculate the sum of the number of credits used and the sum of the bytes transferred for replication.

    For example, to calculate the sum of the number of credits used and the sum of the bytes transferred for replication of the replication group myrg in the last 30 days:

    select sum(credits_used) as credits_used, SUM(bytes_transferred) as bytes_transferred
    from snowflake.account_usage.replication_group_usage_history
    where replication_group_name = 'MYRG'
    and start_time >= current_date - interval '30 days';
    

    Note the output of the sum of the credits used and the sum of bytes transferred:

    +--------------+-------------------+
    | CREDITS_USED | BYTES_TRANSFERRED |
    |--------------+-------------------|
    |  1.357923604 |             22013 |
    +--------------+-------------------+
    
  3. Calculate the replication costs for databases using the values of the bytes transferred for databases, sum of the credits used, and the sum of all bytes transferred for replication from the previous two steps:

    (<database_bytes_transferred> / <bytes_transferred>) * <credits_used>

    For example:

    (22016 / 22013) * 1.357923604 = 1.35810866)

Querying Information Schema Table Functions

For refresh operations within the past 14 days, query the associated Information Schema table functions.

  1. Query the REPLICATION_GROUP_REFRESH_HISTORY table function to view the sum of the number of bytes copied for database replication for the replication group myrg:

    select sum(value:totalBytesToReplicate)
      from table(information_schema.replication_group_refresh_history('myrg')) as rh,
      lateral flatten(input => total_bytes:databases)
      where rh.phase_name = 'COMPLETED'
      and rh.start_time >= current_date - interval '14 days';
    
  2. Query the REPLICATION_GROUP_USAGE_HISTORY table function to view sum of the number of credits used and the sum of the bytes transferred for replication for the replication group myrg:

    select sum(credits_used), sum(bytes_transferred)
      from table(information_schema.replication_group_usage_history(
          date_range_start=>dateadd('day', -14, current_date()),
          replication_group_name => 'myrg'
      ));
    

Comparing Data Sets in Primary and Secondary Databases

If database objects are replicated in a replication or failover group, the HASH_AGG function can be used to compare the rows in a random set of tables in a primary and secondary database to verify data consistency. The HASH_AGG function returns an aggregate signed 64-bit hash value over the (unordered) set of input rows. Query this function on all or a random subset of tables in a secondary database and on the primary database (as of the timestamp for the primary database snapshot) and compare the output.

Example

In the examples below, the database mydb is included in the failover group myfg. The database mydb contains the table mytable.

Executed on Target Account

  1. Query the REPLICATION_GROUP_REFRESH_PROGRESS table function (in the Snowflake Information Schema). Note the primarySnapshotTimestamp in the DETAILS column for the PRIMARY_UPLOADING_METADATA phase. This is the timestamp for the latest snapshot of the primary database.

    SELECT PARSE_JSON(details)['primarySnapshotTimestamp']
      FROM TABLE(information_schema.replication_group_refresh_progress('myfg'))
      WHERE PHASE_NAME = 'PRIMARY_UPLOADING_METADATA';
    
  2. Query the HASH_AGG function for a specified table in the secondary database. The following query returns a hash value for all rows in the mytable table:

    SELECT HASH_AGG( * ) FROM mytable;
    

Executed on Source Account

  1. Query the HASH_AGG function for the same table in the primary database. Using Time Travel, specify the timestamp when the latest snapshot was taken for the secondary database:

    SELECT HASH_AGG( * ) FROM mytable AT(TIMESTAMP => '<primarySnapshotTimestamp>'::TIMESTAMP);
    
  2. Compare the results from the two queries. The output should be identical.