Replication Considerations¶
This topic describes the behavior of certain Snowflake features in secondary databases and objects when replicated with replication or failover groups or database replication, and provides general guidance for working with replicated objects and data.
If you have previously enabled database replication for individual databases using the ALTER DATABASE … ENABLE REPLICATION TO ACCOUNTS command, refer to Database Replication Considerations for additional considerations specific to database replication.
Replication Group and Failover Group Constraints¶
The following sections explain the constraints around adding account objects, databases, and shares to replication and failover groups.
Account Objects¶
An account can only have one replication or failover group that contains objects other than databases or shares.
Replication Privileges¶
This section describes the replication privileges that are available to be granted to roles to specify the operations users can perform on replication and failover group objects in the system. For the syntax of the GRANT command, refer to GRANT <privileges>.
Note
For database replication, only a user with the ACCOUNTADMIN role can enable and manage database replication and failover. For additional information on required privileges for database replication, refer to the required privileges table in Step 6. Refreshing a Secondary Database on a Schedule.
Privilege |
Object |
Usage |
Notes |
---|---|---|---|
OWNERSHIP |
Replication Group Failover Group |
Grants the ability to delete, alter, and grant or revoke access to an object. |
Can be granted by:
|
CREATE REPLICATION GROUP |
Account |
Grants the ability to create a replication group. |
Must be granted by the ACCOUNTADMIN role. |
CREATE FAILOVER GROUP |
Account |
Grants the ability to create a failover group. |
Must be granted by the ACCOUNTADMIN role. |
FAILOVER |
Failover Group |
Grants the ability to promote a secondary failover group to serve as primary failover group. |
Can be granted or revoked by a role with the OWNERSHIP privilege on the group. |
REPLICATE |
Replication Group Failover Group |
Grants the ability to refresh a secondary group. |
Can be granted or revoked by a role with the OWNERSHIP privilege on the group. |
MODIFY |
Replication Group Failover Group |
Grants the ability to change the settings or properties of an object. |
Can be granted or revoked by a role with the OWNERSHIP privilege on the group. |
MONITOR |
Replication Group Failover Group |
Grants the ability to view details within an object. |
Can be granted or revoked by a role with the OWNERSHIP privilege on the group. |
For instructions on creating a custom role with a specified set of privileges, see Creating Custom Roles.
For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.
Replication and References Across Replication Groups¶
Objects in a replication (or failover) group that have dangling references (i.e. references to objects in another replication or failover group) might successfully replicate to a target account in some circumstances. If the replication operation results in behavior in the target account consistent with behavior that can occur in the source account, replication succeeds.
For example, if a column in a table in failover group fg_a
references a sequence in failover group fg_b
, replication of both
groups succeeds. If fg_a
is replicated before fg_b
, insert operations (after failover) on the table that references the
sequence fails if fg_b
was not replicated. This behavior can occur in a source account. If a sequence is dropped in a
source account, insert operations on a table with a column referencing the dropped sequence fails.
When the dangling reference is a security policy that protects data, the replication (or failover) group with the security policy must be replicated before any replication group that contains objects that reference the policy is replicated.
Attention
Making updates to security policies that protect data in separate replication or failover groups may result in inconsistencies and should be done with care.
For database objects, you can view object dependencies in the Account Usage OBJECT_DEPENDENCIES View.
Streams¶
Dangling references for streams cause replication to fail with the following error message:
Primary database: the source object ''<object_name>'' for this stream ''<stream_name>'' is not included in the replication group.
Stream replication does not support replication across databases in different replication groups. Please see Streams Documentation
https://docs.snowflake.com/en/user-guide/account-replication-considerations#replication-and-streams for options.
To avoid dangling reference errors:
The primary database must include both the stream and its base object or
The database that contains the stream and the database that contains the base object referenced by the stream must be included in the same replication or failover group.
Network Policies¶
Dangling references in network policies can cause replication to fail with the following error message:
Dangling references in the snapshot. Correct the errors before refreshing again.
The following references are missing (referred entity <- [referring entities])
To avoid dangling references, specify the following object types in the OBJECT_TYPES
list when executing the CREATE or
ALTER command for the replication or failover group:
If a network policy is associated with the account, include
NETWORK POLICIES
andACCOUNT PARAMETERS
in theOBJECT_TYPES
list.If a network policy is associated with a user, include
USERS
in theOBJECT_TYPES
list.
For details, refer to Network Policies in the Replication of Security Integrations & Network Policies Across Multiple Accounts topic.
Replication and Read-only Secondary Objects¶
All secondary objects in a target account, including secondary databases and shares, are read-only. Changes to replicated objects or object types
cannot be made locally in a target account. For example, if the USERS
object type is replicated from a source
account to a target account, new users cannot be created or modified in the target account.
New, local databases and shares can be created and modified in a target account. If ROLES
are also replicated
to the target account, new roles cannot be created or modified in that target account. Therefore, privileges cannot be granted to (or revoked from)
a
role on a secondary object in the target account. However, privileges can be granted to (or revoked from) a secondary role on local
objects (for example, databases, shares, or replication or failover groups) created in the target account.
Replication and Objects in Target Accounts¶
If you created account objects, for example, users and roles, in your target account by any means other than via replication (for example,
using scripts), these users and roles have no global identifier by default. When a target account is refreshed from the source account, the
refresh operation drops any account objects of the types in the OBJECT_TYPES
list in the target account that have no
global identifier.
Note
The initial refresh operation to replicate USERS or ROLES might result in an error. This is to help prevent accidental deletion of data and metadata associated with users and roles. For more information about the circumstances that determine whether these object types are dropped or the refresh operation fails, refer to Initial Replication of Users and Roles.
To avoid dropping these objects, refer to Apply Global IDs to Objects Created by Scripts in Target Accounts.
Replication and Security Policies¶
The database containing a security policy and the references (i.e. assignments) can be replicated using replication and failover groups. Security policies include:
If you are using database replication, refer to Database Replication and Security Policies.
Password and Session Policies¶
Password and session policy references for users are replicated when specifying the database containing policy
(ALLOWED_DATABASES = policy_db
) and USERS
in a replication group or failover group.
If either the policy database or users have already been replicated to a target account, update the replication or failover group in the source account to include the databases and object types required to successfully replicate the policy. Then execute a refresh operation to update the target account.
If user-level policies are not in use, USERS
do not need to be included in the replication or failover group.
Note
The policy must be in the same account as the account-level policy assignment and the user-level policy assignment.
If you have a session or password policy set on the account or a user in the account and you do not update the replication or failover
group to include the policy_db
containing the policy and USERS
, a dangling reference occurs in the target account. In
this case, a dangling reference means that Snowflake cannot locate the policy in the target account because the fully-qualified name of
the policy points to the database in the source account. Consequently, the target account or users in the target account are not
required to comply with the session policy or the password policy.
To successfully replicate a security policy, verify the replication or failover group includes the object types and databases required to prevent a dangling reference.
Replication and Cloning¶
Cloned objects are replicated physically rather than logically to secondary databases. That is, cloned tables in a standard database do not contribute to the overall data storage unless or until DML operations on the clone add to or modify existing data. However, when a cloned table is replicated to a secondary database, the physical data is also replicated, increasing the data storage usage for your account.
Replication and Automatic Clustering¶
In a primary database, Snowflake monitors clustered tables using Automatic Clustering and reclusters them as needed. As part of a refresh operation, clustered tables are replicated to a secondary database with the current sorting of the table micro-partitions. As such, reclustering is not performed again on the clustered tables in the secondary database, which would be redundant.
If a secondary database contains clustered tables and the database is promoted to become the primary database, Snowflake begins Automatic Clustering of the tables in this database while simultaneously suspending the monitoring of clustered tables in the previous primary database.
See Replication and Materialized Views (in this topic) for information about Automatic Clustering for materialized views.
Replication and Large, High-Churn Tables¶
When one or more rows of a table are updated or deleted, all of the impacted micro-partitions that store this data in a primary database are re-created and must be synchronized to secondary databases. For large, high-churn dimension tables, the replication costs can be significant.
For large, high-churn dimension tables that incur significant replication costs, the following mitigations are available:
Replicate any primary databases that store such tables at a lower frequency.
Change your data model to reduce churn.
For more information, refer to Managing Costs for Large, High-Churn Tables.
Replication and Time Travel¶
Time Travel and Fail-safe data is maintained independently for a secondary database and is not replicated from a primary database. Querying tables and views in a secondary database using Time Travel can produce different results than when executing the same query in the primary database.
- Historical Data
Historical data available to query in a primary database using Time Travel is not replicated to secondary databases.
For example, suppose data is loaded continuously into a table every 10 minutes using Snowpipe, and a secondary database is refreshed every hour. The refresh operation only replicates the latest version of the table. While every hourly version of the table within the retention window is available for query using Time Travel, none of the iterative versions within each hour (the individual Snowpipe loads) are available.
- Data Retention Period
The data retention period for tables in a secondary database begins when the secondary database is refreshed with the DML operations (i.e. changing or deleting data) written to tables in the primary database.
Note
The data retention period parameter, DATA_RETENTION_TIME_IN_DAYS, is only replicated to database objects in the secondary database, not to the database itself. For more details about parameter replication, refer to Parameters.
Replication and Materialized Views¶
In a primary database, Snowflake performs automatic background maintenance of materialized views. When a base table changes, all materialized views defined on the table are updated by a background service that uses compute resources provided by Snowflake. In addition, if Automatic Clustering is enabled for a materialized view, then the view is monitored and reclustered as necessary in a primary database.
A refresh operation replicates the materialized view definitions to a secondary database; the materialized view data is not replicated. Automatic background maintenance of materialized views in a secondary database is enabled by default. If Automatic Clustering is enabled for a materialized view in a primary database, automatic monitoring and reclustering of the materialized view in the secondary database is also enabled.
Note
The charges for automated background synchronization of materialized views are billed to each account that contains a secondary database.
Replication and External Tables¶
External Tables in a Primary Database¶
External tables in a primary database currently cause the replication or refresh operation to fail with the following error message:
003906 (55000): SQL execution error:
Primary database contains an external table '<database_name>'. Replication of a database with external table is not supported
To avoid this error, move the external tables into a separate database that is not replicated. Alternatively, if you are migrating your databases to another account, you could clone the primary database, drop the external table from the clone, and then replicate the cloned database. After you promote the secondary database in the target account, you would need to recreate the external tables in the database.
External Tables in a Secondary Database¶
External tables can exist in a secondary database if it was the primary database at a prior time and the external tables were created during that time. Once another database is promoted to be the designated primary database, it becomes a secondary database. External tables in a secondary database currently cause the refresh operation to fail with the following error:
003958 (55000): SQL execution error:
Secondary database contains an external table '<table_name>'. Replication of a database with external table is not supported.
To avoid this error, move the external table into a separate database that is not a replicated secondary database.
Replication and Dynamic Tables¶
Dynamic table replication behaves differently depending on if the primary database that contains the dynamic table is replicated in a replication group or a failover group.
Dynamic Tables and Replication Groups¶
A database that contains a dynamic table can be replicated using a replication group. The base object(s) it depends on are not required to be in the same replication group.
Secondary objects are read-only in the target account. If a secondary replication group is dropped in a target account, the databases that were included in the group become read-write. However, any dynamic tables included in a replication group remain read-only even after the secondary group is dropped in the target account. No DML or Dynamic Table refreshes can happen on these read-only dynamic tables.
Dynamic Tables and Failover Groups and Database Replication¶
A database that contains a dynamic table can be replicated using a failover group. The base objects it depends on must be included in the same failover group. Otherwise, a dangling reference occurs during the refresh operation with the following error message:
002749 (55000): SQL execution error: Dynamic table <dt_name> has dependencies that are missing in the replication membership: <dependency_table_name>
The same applies if the dynamic table is replicated in a database using database replication. The base objects the dynamic table depends on must be included in the same database to avoid a dangling reference error.
Replication and Snowpipe Streaming¶
A table populated by Snowpipe Streaming in a primary database is replicated to the secondary database in a target account.
In the primary database, tables are created and rows are inserted through channels. Offset tokens track the ingestion progress. A refresh operation replicates the table object, table data, and the channel offsets associated with the table from the primary database to the secondary database.
Snowflake Streaming read-only operations to retrieve offsets are available in the source and target accounts:
The channel getLatestCommittedOffsetToken API
SHOW CHANNELS command
Snowflake Streaming write operations are only available in the source account:
The client openChannel API
The channel insertRow API
The channel insertRows API
Avoiding Data Loss¶
To avoid data loss in the case of failover, the data retention time for successfully inserted rows in your upstream data source must be greater than the replication frequency. If data is inserted into a table in a primary database, and failover occurs before the data can be replicated to the secondary database, the same data will need to be inserted into the table in the newly promoted primary database. The following is an example scenario:
Table
t1
in primary databaserepl_db
is populated with data with Snowpipe Streaming and the Kafka connector.The
offsetToken
is 100 for channel 1 and 100 for channel 2 fort1
in the primary database.A refresh operation completes successfully in the target account.
The
offsetToken
is 100 for channel 1 and 100 for channel 2 for thet1
in the secondary database.More rows are inserted into
t1
in the primary database.The
offsetToken
is now 200 for channel 1 and 200 for channel 2 for thet1
in the primary database.A failover occurs before the additional rows and new channel offsets can be replicated to the secondary database.
In this case, there are 100 missing offsets in each channel for table t1
in the newly promoted primary database. To insert the missing data, refer to Reopen Active Channels for Snowpipe Streaming in Newly Promoted Source Account.
Requirements¶
Snowpipe Streaming replication support requires the following minimum versions:
Snowflake Ingest SDK version 1.1.1 and later
If you are using the Kafka connector: Kafka connector version 1.9.3 and later
Replication and Stages¶
The following constraints apply to stage objects:
Snowflake currently supports stage replication as part of group-based replication (replication and failover groups). Stage replication is not supported for database replication.
You can replicate an external stage. However, the files on an external stage are not replicated.
You can replicate an internal stage. To replicate the files on an internal stage, you must enable a directory table on the stage. Snowflake replicates only the files that are mapped by the directory table.
When you replicate an internal stage with a directory table, you cannot disable the directory table on the primary or secondary stage. The directory table contains critical information about replicated files and files loaded using a COPY statement.
A refresh operation will fail if the directory table on an internal stage contains a file that is larger than 2GB.
Files on user stages and table stages are not replicated.
For named external stages that use a storage integration, you must configure the trust relationship for secondary storage integrations in your target accounts prior to failover. For more information, see Configure Cloud Storage Access for Secondary Storage Integrations.
If you replicate an external stage with a directory table, and you have configured automated refresh for the source directory table, you must configure automated refresh for the secondary directory table before failover. For more information, see Configure Automated Refresh for Directory Tables on Secondary Stages.
Replication and Pipes¶
The following constraints apply to pipe objects:
Snowflake currently supports pipe replication as part of group-based replication (replication and failover groups). Pipe replication is not supported for database replication.
Snowflake replicates the copy history of a pipe only when the pipe belongs to the same replication group as its target table.
Replication of notification integrations is not supported.
To receive notifications, you must configure a secondary auto-ingest pipe in a target account prior to failover. For more information, see Configure Notifications for Secondary Auto-ingest Pipes.
Replication of Stored Procedures and User-Defined Functions (UDFs)¶
Stored procedures and UDFs are replicated from a primary database to secondary databases.
If a stored procedure or UDF depends on files in a stage (for example, if the stored procedure is defined in Python code that is uploaded from a stage), you must replicate the stage and its files to the secondary database. For more information about replicating stages, see Stage, Pipe, and Load History Replication.
For example, if a primary database has an in-line Python UDF that imports any code that is stored on a stage, the UDF does not work unless the stage and its imported code are replicated in the secondary database.
Replication and Streams¶
This section describes recommended practices and potential areas of concern when replicating streams in Replicating Databases Across Multiple Accounts or Account Replication and Failover/Failback.
Supported Source Objects for Streams¶
Replicated streams can successfully track the change data for tables and views in the same database.
Currently, the following source object types are not supported:
External tables
Tables or views in databases separate from the stream databases.
Note that this design works if both the stream database and the database that stores the source object are included in the same replication or failover group.
Tables or views in a shared databases (i.e. databases shared from provider accounts to your account)
A database replication or refresh operation fails if the primary database includes a stream with an unsupported source object. The operation also fails if the source object for any stream has been dropped.
Avoiding Data Duplication¶
Note
In addition to the scenario described in this section, streams in a secondary database could return duplicate rows the first time they are included in a refresh operation. In this case, duplicate rows refers to a single row with multiple METADATA$ACTION column values.
After the initial refresh operation, you should not encounter this specific issue in a secondary database.
Data duplication occurs when DML operations write the same change data from a stream multiple times without a uniqueness check. This can occur if a stream and a destination table for the stream change data are stored in separate databases, and these databases are not replicated and failed over in the same group.
For example, suppose you regularly insert change data from stream s
into table dt
. (For this example, the source object for the stream does not matter.) Separate databases store the stream and destination table.
At timestamp
t1
, a row is inserted into the source table for streams
, creating a new table version. The stream stores the offset for this table version.At timestamp
t2
, the secondary database that stores the stream is refreshed. Replicated streams
now stores the offset.At timestamp
t3
, the change data for streams
is inserted into tabledt
.At timestamp
t4
, the secondary database that stores streams
is failed over.At timestamp
t5
, the change data for streams
is inserted again into tabledt
.
To avoid this situation, replicate and fail over together the databases that store streams and their destination tables.
Stream References in Task WHEN Clause¶
To avoid unexpected behavior when running replicated tasks that reference streams in the WHEN boolean_expr
clause, we recommend that you either:
Create the tasks and streams in the same database, or
If streams are stored in a different database from the tasks that reference them, include both databases in the same failover group.
If a task references a stream in a separate database, and both databases are not included in the same failover group, then the database that contains the task could be failed over without the database that contains the stream. In this scenario, when the task is resumed in the failed over database, it records an error when it attempts to run and cannot find the referenced stream. This issue can be resolved by either failing over the database that contains the stream or recreating the database and stream in the same account as the failed over database that contains the task.
Stream Staleness¶
If a stream in the primary database has become stale, the replicated stream in a secondary database is also stale and cannot be queried or its change data consumed. To resolve this issue, recreate the stream in the primary database (using CREATE OR REPLACE STREAM). When the secondary database is refreshed, the replicated stream is readable again.
Note that the offset for a recreated stream is the current table version by default. You can recreate a stream that points to an earlier table version using Time Travel; however, the replicated stream would remain unreadable. For more information, see Stream Replication and Time Travel (in this topic).
Stream Replication and Time Travel¶
After a primary database is failed over, if a stream in the database uses Time Travel to read a table version for the source object from a point in time before the last refresh timestamp, the replicated stream cannot be queried or the change data consumed. Likewise, querying the change data for a source object from a point in time before the last refresh timestamp using the CHANGES clause for SELECT statements fails with an error.
This is because a refresh operation collapses the table history into a single table version. Iterative table versions created before the refresh operation timestamp are not preserved in the table history for the replicated source objects.
Consider the following example:
Table
t1
is created in the primary database with change tracking enabled (table versiontv0
). Subsequent DML transactions create table versionstv1
andtv2
.A secondary database that contains table
t1
is refreshed. The table version for this replicated table istv2
; however, the table history is not replicated.A stream is created in the primary database with its offset set to table version
tv1
using Time Travel.The secondary database is failed over, becoming the primary database.
Querying stream
s1
returns an error, because table versiontv1
is not in the table history.
Note that when a subsequent DML transaction on table t1
iterates the table version to tv3
, the offset for stream s1
is advanced. The stream is readable again.
Avoiding Data Loss¶
Data loss can occur when the most recent refresh operation for a secondary database is not completed prior to the failover operation. We recommend refreshing your secondary databases frequently to minimize the risk.
Replication and Tasks¶
This section describes task replication in Replicating Databases Across Multiple Accounts or Account Replication and Failover/Failback.
Replication Scenarios¶
The following table describes different task scenarios and specifies whether the tasks are replicated or not. Except where noted, the scenarios pertain to both standalone tasks and tasks in a DAG:
Scenario |
Replicated |
Notes |
---|---|---|
Task was created and either resumed or executed manually (using EXECUTE TASK). Resuming or executing a task creates an initial task version. |
✔ |
|
Task was created but never resumed or executed. |
❌ |
|
Task was recreated (using CREATE OR REPLACE TASK but never resumed or executed. |
✔ |
The latest version before the task was recreated is replicated. Resuming or manually executing the task commits a new version. When the database is replicated again, the new, or latest, version is replicated to the secondary database. |
Task was created and resumed or executed, subsequently modified (using ALTER TASK), but not resumed or executed again. |
✔ |
Resuming or manually executing a task commits a new version that includes any changes to the task parameters. Because the new changes were never committed, the latest version before the task was suspended and modified is replicated. Note that if the modified task is not resumed within a retention period (currently 30 days), the latest version of the task is dropped. After this period, the task is not replicated to a secondary database unless it is resumed again. |
Standalone task was created and resumed or executed, but subsequently dropped. |
❌ |
|
Root task in a DAG was created and resumed or executed, but was subsequently suspended and dropped. |
❌ |
The entire DAG is not replicated to a secondary database. |
Child task in a DAG is created and resumed or executed, but is subsequently suspended and dropped. |
✔ |
The latest version of the DAG (before the task was suspended and dropped) is replicated to a secondary database. |
Resumed or Suspended State of Replicated Tasks¶
If all of the following conditions are met, a task is replicated to a secondary database in a resumed state:
A standalone or root task is in a resumed state in the primary database when the replication or refresh operation begins until the operation is completed. If a task is in a resumed state during only part of this period, it might still be replicated in a resumed state.
A child task is in a resumed state in the latest version of the task.
The parent database was replicated to the target account along with role objects in the same, or different, replication or failover group.
After the roles and database are replicated, you must refresh the objects in the target account by executing either ALTER REPLICATION GROUP … REFRESH or ALTER FAILOVER GROUP … REFRESH, respectively. If you refresh the database by executing ALTER DATABASE … REFRESH, the state of the tasks in the database is changed to suspended.
A replication or refresh operation includes the privilege grants for a task that were current when the latest table version was committed. For more information, see Replicated Tasks and Privilege Grants (in this topic).
If these conditions are not met, the task is replicated to a secondary database in a suspended state.
Note
All secondary tasks are suspended regardless of their state
. For more details, refer to Task Runs After a Failover
Replicated Tasks and Privilege Grants¶
If the parent database is replicated to a target account along with role objects in the same, or different, replication or failover group, the privileges granted on the tasks in the database are replicated as well.
The following logic determines which task privileges are replicated in a replication or refresh operation:
If the current task owner (i.e. the role that has the OWNERSHIP privilege on a task) is the same role as when the task was resumed last, then all current grants on the task are replicated to the secondary database.
If the current task owner is not the same role as when the task was resumed last, then only the OWNERSHIP privilege granted to the owner role in the task version is replicated to the secondary database.
If the current task owner role is not available (e.g. a child task is dropped but a new version of the task DAG is not committed yet), then only the OWNERSHIP privilege granted to the owner role in the task version is replicated to the secondary database.
Task Runs After a Failover¶
After a secondary failover group is promoted to serve as the primary group, any resumed tasks in databases within the failover group are scheduled gradually. The amount of time required to restore normal scheduling of all resumed standalone tasks and DAGs depends on the number of resumed tasks in a database.
Replication and Tags¶
Tags and their assignments can be replicated from a source account to a target account.
Tag assignments cannot be modified in the target account after the initial replication from the source account. For example, setting a tag on a secondary (i.e. replicated) database is not allowed. To modify tag assignments in the target account, modify them in the source account and replicate them to the target account.
To successfully replicate tags, ensure that the replication or failover group includes:
The database containing the tags in the
ALLOWED_DATABASES
property.Other account-level objects that have a tag in the
OBJECT_TYPES
property (e.g.ROLES
,WAREHOUSES
).For details, refer to CREATE REPLICATION GROUP and CREATE FAILOVER GROUP.
Historical Usage Data¶
Historical usage data for activity in a primary database is not replicated to secondary databases. Each account has its own query history, login history, etc.
Historical usage data includes the query data returned by the following Snowflake Information Schema table functions or Account Usage views:
COPY_HISTORY
LOGIN_HISTORY
QUERY_HISTORY
etc.