Replication Considerations

This topic describes the behavior of certain Snowflake features in secondary databases and objects when replicated with replication or failover groups or database replication, and provides general guidance for working with replicated objects and data.

If you have previously enabled database replication for individual databases using the ALTER DATABASE … ENABLE REPLICATION TO ACCOUNTS command, refer to Database Replication Considerations for additional considerations specific to database replication.

Replication Group and Failover Group Constraints

The following sections explain the constraints around adding account objects, databases, and shares to replication and failover groups.

Database and Share Objects

The following constraints apply to database and share objects:

  • An object can only be in one failover group.

  • An object can be in multiple replication groups as long as each group is replicated to a different target account.

  • An object cannot be in both a failover group and a replication group.

  • Secondary (replica) objects cannot be added to a primary replication or failover group.

Account Objects

An account can only have one replication or failover group that contains objects other than databases or shares.

Replication Privileges

This section describes the replication privileges that are available to be granted to roles to specify the operations users can perform on replication and failover group objects in the system. For the syntax of the GRANT command, refer to GRANT <privileges>.

Note

For database replication, only a user with the ACCOUNTADMIN role can enable and manage database replication and failover. For additional information on required privileges for database replication, refer to the required privileges table in Step 6. Refreshing a Secondary Database on a Schedule.

Privilege

Object

Usage

Notes

OWNERSHIP

Replication Group

Failover Group

Grants the ability to delete, alter, and grant or revoke access to an object.

Can be granted by:

The ACCOUNTADMIN role or

A role that has the MANAGE GRANTS privilege or

A role that has the OWNERSHIP privilege on the group.

CREATE REPLICATION GROUP

Account

Grants the ability to create a replication group.

Must be granted by the ACCOUNTADMIN role.

CREATE FAILOVER GROUP

Account

Grants the ability to create a failover group.

Must be granted by the ACCOUNTADMIN role.

FAILOVER

Failover Group

Grants the ability to promote a secondary failover group to serve as primary failover group.

Can be granted or revoked by a role with the OWNERSHIP privilege on the group.

REPLICATE

Replication Group

Failover Group

Grants the ability to refresh a secondary group.

Can be granted or revoked by a role with the OWNERSHIP privilege on the group.

MODIFY

Replication Group

Failover Group

Grants the ability to change the settings or properties of an object.

Can be granted or revoked by a role with the OWNERSHIP privilege on the group.

MONITOR

Replication Group

Failover Group

Grants the ability to view details within an object.

Can be granted or revoked by a role with the OWNERSHIP privilege on the group.

For instructions on creating a custom role with a specified set of privileges, see Creating Custom Roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.

Replication and References Across Replication Groups

Objects in a replication (or failover) group that have dangling references (i.e. references to objects in another replication or failover group) might successfully replicate to a target account in some circumstances. If the replication operation results in behavior in the target account consistent with behavior that can occur in the source account, replication succeeds.

For example, if a column in a table in failover group fg_a references a sequence in failover group fg_b, replication of both groups succeeds. If fg_a is replicated before fg_b, insert operations (after failover) on the table that references the sequence fails if fg_b was not replicated. This behavior can occur in a source account. If a sequence is dropped in a source account, insert operations on a table with a column referencing the dropped sequence fails.

When the dangling reference is a security policy that protects data, the replication (or failover) group with the security policy must be replicated before any replication group that contains objects that reference the policy is replicated.

Attention

Making updates to security policies that protect data in separate replication or failover groups may result in inconsistencies and should be done with care.

For database objects, you can view object dependencies in the Account Usage OBJECT_DEPENDENCIES View.

Streams

Dangling references for streams cause replication to fail with the following error message:

Primary database: the source object ''<object_name>'' for this stream ''<stream_name>'' is not included in the replication group.
Stream replication does not support replication across databases in different replication groups. Please see Streams Documentation
https://docs.snowflake.com/en/user-guide/account-replication-considerations#replication-and-streams for options.
Copy

To avoid dangling reference errors:

  • The primary database must include both the stream and its base object or

  • The database that contains the stream and the database that contains the base object referenced by the stream must be included in the same replication or failover group.

Network Policies

Dangling references in network policies can cause replication to fail with the following error message:

Dangling references in the snapshot. Correct the errors before refreshing again.
The following references are missing (referred entity <- [referring entities])
Copy

To avoid dangling references, specify the following object types in the OBJECT_TYPES list when executing the CREATE or ALTER command for the replication or failover group:

  • If a network policy is associated with the account, include NETWORK POLICIES and ACCOUNT PARAMETERS in the OBJECT_TYPES list.

  • If a network policy is associated with a user, include USERS in the OBJECT_TYPES list.

For details, refer to Network Policies in the Replication of Security Integrations & Network Policies Across Multiple Accounts topic.

Replication and Read-only Secondary Objects

All secondary objects in a target account, including secondary databases and shares, are read-only. Changes to replicated objects or object types cannot be made locally in a target account. For example, if the USERS object type is replicated from a source account to a target account, new users cannot be created or modified in the target account.

New, local databases and shares can be created and modified in a target account. If ROLES are also replicated to the target account, new roles cannot be created or modified in that target account. Therefore, privileges cannot be granted to (or revoked from) a role on a secondary object in the target account. However, privileges can be granted to (or revoked from) a secondary role on local objects (for example, databases, shares, or replication or failover groups) created in the target account.

Replication and Objects in Target Accounts

If you create account objects, for example, users and roles, in your target account by any means other than via replication (e.g. using scripts), these users and roles have no global identifier by default. When a target account is refreshed from the source account, the refresh operation drops any account objects of the types in the OBJECT_TYPES list (e.g. USERS or ROLES) in the target account that have no global identifier.

To avoid dropping these objects, refer to Apply Global IDs to Objects Created by Scripts in Target Accounts.

Replication and Security Policies

The database containing a security policy and the references (i.e. assignments) can be replicated using replication and failover groups. Security policies include:

If you are using database replication, refer to Database Replication and Security Policies.

Password and Session Policies

Password and session policy references for users are replicated when specifying the database containing policy (ALLOWED_DATABASES = policy_db) and USERS in a replication group or failover group.

If either the policy database or users have already been replicated to a target account, update the replication or failover group in the source account to include the databases and object types required to successfully replicate the policy. Then execute a refresh operation to update the target account.

If user-level policies are not in use, USERS do not need to be included in the replication or failover group.

Note

The policy must be in the same account as the account-level policy assignment and the user-level policy assignment.

If you have a session or password policy set on the account or a user in the account and you do not update the replication or failover group to include the policy_db containing the policy and USERS, a dangling reference occurs in the target account. In this case, a dangling reference means that Snowflake cannot locate the policy in the target account because the fully-qualified name of the policy points to the database in the source account. Consequently, the target account or users in the target account are not required to comply with the session policy or the password policy.

To successfully replicate a security policy, verify the replication or failover group includes the object types and databases required to prevent a dangling reference.

Replication and Cloning

Cloned objects are replicated physically rather than logically to secondary databases. That is, cloned tables in a standard database do not contribute to the overall data storage unless or until DML operations on the clone add to or modify existing data. However, when a cloned table is replicated to a secondary database, the physical data is also replicated, increasing the data storage usage for your account.

Replication and Automatic Clustering

In a primary database, Snowflake monitors clustered tables using Automatic Clustering and reclusters them as needed. As part of a refresh operation, clustered tables are replicated to a secondary database with the current sorting of the table micro-partitions. As such, reclustering is not performed again on the clustered tables in the secondary database, which would be redundant.

If a secondary database contains clustered tables and the database is promoted to become the primary database, Snowflake begins Automatic Clustering of the tables in this database while simultaneously suspending the monitoring of clustered tables in the previous primary database.

See Replication and Materialized Views (in this topic) for information about Automatic Clustering for materialized views.

Replication and Large, High-Churn Tables

When one or more rows of a table are updated or deleted, all of the impacted micro-partitions that store this data in a primary database are re-created and must be synchronized to secondary databases. For large, high-churn dimension tables, the replication costs can be significant.

For large, high-churn dimension tables that incur significant replication costs, the following mitigations are available:

  • Replicate any primary databases that store such tables at a lower frequency.

  • Change your data model to reduce churn.

For more information, refer to Managing Costs for Large, High-Churn Tables.

Replication and Time Travel

Time Travel and Fail-safe data is maintained independently for a secondary database and is not replicated from a primary database. Querying tables and views in a secondary database using Time Travel can produce different results than when executing the same query in the primary database.

Historical Data

Historical data available to query in a primary database using Time Travel is not replicated to secondary databases.

For example, suppose data is loaded continuously into a table every 10 minutes using Snowpipe, and a secondary database is refreshed every hour. The refresh operation only replicates the latest version of the table. While every hourly version of the table within the retention window is available for query using Time Travel, none of the iterative versions within each hour (the individual Snowpipe loads) are available.

Data Retention Period

The data retention period for tables in a secondary database begins when the secondary database is refreshed with the DML operations (i.e. changing or deleting data) written to tables in the primary database.

Note

The data retention period parameter, DATA_RETENTION_TIME_IN_DAYS, is only replicated to database objects in the secondary database, not to the database itself. For more details about parameter replication, refer to Parameters.

Replication and Materialized Views

In a primary database, Snowflake performs automatic background maintenance of materialized views. When a base table changes, all materialized views defined on the table are updated by a background service that uses compute resources provided by Snowflake. In addition, if Automatic Clustering is enabled for a materialized view, then the view is monitored and reclustered as necessary in a primary database.

A refresh operation replicates the materialized view definitions to a secondary database; the materialized view data is not replicated. Automatic background maintenance of materialized views in a secondary database is enabled by default. If Automatic Clustering is enabled for a materialized view in a primary database, automatic monitoring and reclustering of the materialized view in the secondary database is also enabled.

Note

The charges for automated background synchronization of materialized views are billed to each account that contains a secondary database.

Replication and External Tables

External Tables in a Primary Database

External tables in a primary database currently cause the replication or refresh operation to fail with the following error message:

003906 (55000): SQL execution error:
Primary database contains an external table '<database_name>'. Replication of a database with external table is not supported
Copy

To avoid this error, move the external tables into a separate database that is not replicated. Alternatively, if you are migrating your databases to another account, you could clone the primary database, drop the external table from the clone, and then replicate the cloned database. After you promote the secondary database in the target account, you would need to recreate the external tables in the database.

External Tables in a Secondary Database

External tables can exist in a secondary database if it was the primary database at a prior time and the external tables were created during that time. Once another database is promoted to be the designated primary database, it becomes a secondary database. External tables in a secondary database currently cause the refresh operation to fail with the following error:

003958 (55000): SQL execution error:
Secondary database contains an external table '<table_name>'. Replication of a database with external table is not supported.
Copy

To avoid this error, move the external table into a separate database that is not a replicated secondary database.

Replication of Stored Procedures and User-Defined Functions (UDFs)

Stored procedures and UDFs are replicated from a primary database to secondary databases.

Note that stage replication is not yet supported. If a stored procedure or UDF depends on files in a stage (e.g. if the stored procedure is defined in Python code that is uploaded from a stage), you must replicate the stage and its files to the secondary database manually.

For example, if a primary database has an in-line Python UDF that imports any code that is stored in a stage, the UDF is replicated to a secondary database, but does not work until the stage and imported code are manually replicated in the secondary database.

Replication and Streams

This section describes recommended practices and potential areas of concern when replicating streams in Replicating Databases Across Multiple Accounts or Account Replication and Failover/Failback.

Supported Source Objects for Streams

Replicated streams can successfully track the change data for tables and views in the same database.

Currently, the following source object types are not supported:

  • Directory tables

  • External tables

  • Tables or views in databases separate from the stream databases.

    Note that this design works if both the stream database and the database that stores the source object are included in the same replication or failover group.

  • Tables or views in a shared databases (i.e. databases shared from provider accounts to your account)

A database replication or refresh operation fails if the primary database includes a stream with an unsupported source object. The operation also fails if the source object for any stream has been dropped.

Avoiding Data Duplication

Note

In addition to the scenario described in this section, streams in a secondary database could return duplicate rows the first time they are included in a refresh operation. In this case, duplicate rows refers to a single row with multiple METADATA$ACTION column values.

After the initial refresh operation, you should not encounter this specific issue in a secondary database.

Data duplication occurs when DML operations write the same change data from a stream multiple times without a uniqueness check. This can occur if a stream and a destination table for the stream change data are stored in separate databases, and these databases are not replicated and failed over in the same group.

For example, suppose you regularly insert change data from stream s into table dt. (For this example, the source object for the stream does not matter.) Separate databases store the stream and destination table.

  1. At timestamp t1, a row is inserted into the source table for stream s, creating a new table version. The stream stores the offset for this table version.

  2. At timestamp t2, the secondary database that stores the stream is refreshed. Replicated stream s now stores the offset.

  3. At timestamp t3, the change data for stream s is inserted into table dt.

  4. At timestamp t4, the secondary database that stores stream s is failed over.

  5. At timestamp t5, the change data for stream s is inserted again into table dt.

To avoid this situation, replicate and fail over together the databases that store streams and their destination tables.

Stream References in Task WHEN Clause

To avoid unexpected behavior when running replicated tasks that reference streams in the WHEN boolean_expr clause, we recommend that you either:

  • Create the tasks and streams in the same database, or

  • If streams are stored in a different database from the tasks that reference them, include both databases in the same failover group.

If a task references a stream in a separate database, and both databases are not included in the same failover group, then the database that contains the task could be failed over without the database that contains the stream. In this scenario, when the task is resumed in the failed over database, it records an error when it attempts to run and cannot find the referenced stream. This issue can be resolved by either failing over the database that contains the stream or recreating the database and stream in the same account as the failed over database that contains the task.

Stream Staleness

If a stream in the primary database has become stale, the replicated stream in a secondary database is also stale and cannot be queried or its change data consumed. To resolve this issue, recreate the stream in the primary database (using CREATE OR REPLACE STREAM). When the secondary database is refreshed, the replicated stream is readable again.

Note that the offset for a recreated stream is the current table version by default. You can recreate a stream that points to an earlier table version using Time Travel; however, the replicated stream would remain unreadable. For more information, see Stream Replication and Time Travel (in this topic).

Stream Replication and Time Travel

After a primary database is failed over, if a stream in the database uses Time Travel to read a table version for the source object from a point in time before the last refresh timestamp, the replicated stream cannot be queried or the change data consumed. Likewise, querying the change data for a source object from a point in time before the last refresh timestamp using the CHANGES clause for SELECT statements fails with an error.

This is because a refresh operation collapses the table history into a single table version. Iterative table versions created before the refresh operation timestamp are not preserved in the table history for the replicated source objects.

Consider the following example:

Stream replication and Time Travel
  1. Table t1 is created in the primary database with change tracking enabled (table version tv0). Subsequent DML transactions create table versions tv1 and tv2.

  2. A secondary database that contains table t1 is refreshed. The table version for this replicated table is tv2; however, the table history is not replicated.

  3. A stream is created in the primary database with its offset set to table version tv1 using Time Travel.

  4. The secondary database is failed over, becoming the primary database.

  5. Querying stream s1 returns an error, because table version tv1 is not in the table history.

Note that when a subsequent DML transaction on table t1 iterates the table version to tv3, the offset for stream s1 is advanced. The stream is readable again.

Avoiding Data Loss

Data loss can occur when the most recent refresh operation for a secondary database is not completed prior to the failover operation. We recommend refreshing your secondary databases frequently to minimize the risk.

Enable Stream Replication

To enable stream replication, use the SYSTEM$BEHAVIOR_CHANGE_BUNDLE_STATUS function to enable the 2023_03 behavior change bundle. For more information, refer to Behavior Change Management.

Replication and Tasks

This section describes task replication in Replicating Databases Across Multiple Accounts or Account Replication and Failover/Failback.

Replication Scenarios

The following table describes different task scenarios and specifies whether the tasks are replicated or not. Except where noted, the scenarios pertain to both standalone tasks and tasks in a DAG:

Scenario

Replicated

Notes

Task was created and either resumed or executed manually (using EXECUTE TASK). Resuming or executing a task creates an initial task version.

Task was created but never resumed or executed.

Task was recreated (using CREATE OR REPLACE TASK but never resumed or executed.

The latest version before the task was recreated is replicated.

Resuming or manually executing the task commits a new version. When the database is replicated again, the new, or latest, version is replicated to the secondary database.

Task was created and resumed or executed, subsequently modified (using ALTER TASK), but not resumed or executed again.

Resuming or manually executing a task commits a new version that includes any changes to the task parameters. Because the new changes were never committed, the latest version before the task was suspended and modified is replicated.

Note that if the modified task is not resumed within a retention period (currently 30 days), the latest version of the task is dropped. After this period, the task is not replicated to a secondary database unless it is resumed again.

Standalone task was created and resumed or executed, but subsequently dropped.

Root task in a DAG was created and resumed or executed, but was subsequently suspended and dropped.

The entire DAG is not replicated to a secondary database.

Child task in a DAG is created and resumed or executed, but is subsequently suspended and dropped.

The latest version of the DAG (before the task was suspended and dropped) is replicated to a secondary database.

Resumed or Suspended State of Replicated Tasks

If all of the following conditions are met, a task is replicated to a secondary database in a resumed state:

  • A standalone or root task is in a resumed state in the primary database when the replication or refresh operation begins until the operation is completed. If a task is in a resumed state during only part of this period, it might still be replicated in a resumed state.

    A child task is in a resumed state in the latest version of the task.

  • The parent database was replicated to the target account along with role objects in the same, or different, replication or failover group.

    After the roles and database are replicated, you must refresh the objects in the target account by executing either ALTER REPLICATION GROUP … REFRESH or ALTER FAILOVER GROUP … REFRESH, respectively. If you refresh the database by executing ALTER DATABASE … REFRESH, the state of the tasks in the database is changed to suspended.

    A replication or refresh operation includes the privilege grants for a task that were current when the latest table version was committed. For more information, see Replicated Tasks and Privilege Grants (in this topic).

If these conditions are not met, the task is replicated to a secondary database in a suspended state.

Note

All secondary tasks are suspended regardless of their state. For more details, refer to Task Runs After a Failover

Replicated Tasks and Privilege Grants

If the parent database is replicated to a target account along with role objects in the same, or different, replication or failover group, the privileges granted on the tasks in the database are replicated as well.

The following logic determines which task privileges are replicated in a replication or refresh operation:

  • If the current task owner (i.e. the role that has the OWNERSHIP privilege on a task) is the same role as when the task was resumed last, then all current grants on the task are replicated to the secondary database.

  • If the current task owner is not the same role as when the task was resumed last, then only the OWNERSHIP privilege granted to the owner role in the task version is replicated to the secondary database.

  • If the current task owner role is not available (e.g. a child task is dropped but a new version of the task DAG is not committed yet), then only the OWNERSHIP privilege granted to the owner role in the task version is replicated to the secondary database.

Task Runs After a Failover

After a secondary failover group is promoted to serve as the primary group, any resumed tasks in databases within the failover group are scheduled gradually. The amount of time required to restore normal scheduling of all resumed standalone tasks and DAGs depends on the number of resumed tasks in a database.

Enable Task Replication

To enable task replication, use the SYSTEM$BEHAVIOR_CHANGE_BUNDLE_STATUS function to enable the 2023_03 behavior change bundle. For more information, refer to Behavior Change Management.

Replication and Tags

Tags and their assignments can be replicated from a source account to a target account.

Tag assignments cannot be modified in the target account after the initial replication from the source account. For example, setting a tag on a secondary (i.e. replicated) database is not allowed. To modify tag assignments in the target account, modify them in the source account and replicate them to the target account.

To successfully replicate tags, ensure that the replication or failover group includes:

  • The database containing the tags in the ALLOWED_DATABASES property.

  • Other account-level objects that have a tag in the OBJECT_TYPES property (e.g. ROLES, WAREHOUSES).

    For details, refer to CREATE REPLICATION GROUP and CREATE FAILOVER GROUP.

Historical Usage Data

Historical usage data for activity in a primary database is not replicated to secondary databases. Each account has its own query history, login history, etc.

Historical usage data includes the query data returned by the following Snowflake Information Schema table functions or Account Usage views:

  • COPY_HISTORY

  • LOGIN_HISTORY

  • QUERY_HISTORY

  • etc.