Failing over account objects¶
This topic describes the steps necessary to fail over replicated account objects across multiple accounts in different regions for disaster recovery.
Prerequisite requirements¶
Enable replication for a primary failover group in a set of accounts.
Create at least one secondary failover group (i.e. replica) of the primary failover group in one or more accounts and regularly refresh (i.e. synchronize) the replica with the latest updates to the objects in the failover group.
For instructions, see Replicating databases and account objects across multiple accounts.
Promoting a target account to serve as the source account¶
To promote a target account to serve as the source account, you must sign in to the target account you want to promote to serve as the new source account and execute the ALTER FAILOVER GROUP … PRIMARY command.
Promote a secondary failover group to primary failover group¶
Note
The example in this section must be executed by a role with the FAILOVER privilege.
The following example promotes myaccount2
in the current myorg
organization to serve as the source account.
Sign in to target account
myaccount2
.List failover groups in the account:
SHOW FAILOVER GROUPS;
Execute the following statement for each secondary failover group you want to promote to serve as the primary failover group:
ALTER FAILOVER GROUP myfg PRIMARY;
Note
During a partial outage in your source region, the replication service might continue to be available and might continue to refresh the secondary failover groups in target regions.
To ensure data integrity, Snowflake prevents failover if a refresh operation is in progress. This means you cannot promote a secondary failover group to serve as the primary if it is being refreshed by a replication operation. The ALTER FAILOVER GROUP … PRIMARY command returns an error in this scenario.
Resolving failover statement failure due to an in-progress refresh operation¶
If there is a refresh operation in progress for the secondary failover group you are trying to promote, the failover statement results in the following error:
Replication group "<GROUP_NAME>" cannot currently be set as primary because it is being
refreshed. Either wait for the refresh to finish or cancel the refresh and try again.
To successfully fail over, you must complete the following steps.
Select and complete one of the following options:
Important
Suspending a refresh operation in the SECONDARY_DOWNLOADING_METADATA or SECONDARY_DOWNLOADING_DATA phase might result in an inconsistent state on the target account. For more information, see View the current phase of an in-progress refresh operation.
Suspend future refresh operations for the failover group. If there is an in-progress refresh operation, you must wait for it to complete before you can failover:
ALTER FAILOVER GROUP myfg SUSPEND;
Suspend future refresh operations and cancel a scheduled refresh operation that is currently in progress (if there is one).
If the in-progress refresh operation was manually triggered, see Cancel an in-progress refresh operation that wasn’t automatically scheduled.
ALTER FAILOVER GROUP myfg SUSPEND IMMEDIATE;
Note
You might experience a slight delay between the time that the statement returns and the time that the cancellation of the refresh operation is finished.
Verify no refresh operations are in progress for the failover group
myfg
. The following query should return no results:SELECT phase_name, start_time, job_uuid FROM TABLE(INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_HISTORY('myfg')) WHERE phase_name <> 'COMPLETED' and phase_name <> 'CANCELED';
To see canceled refresh operations for failover group
myfg
, you can execute the following statement:SELECT phase_name, start_time, job_uuid FROM TABLE(INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_HISTORY('myfg')) WHERE phase_name = 'CANCELED';
Now you can promote the secondary failover group
myfg
to primary failover group:ALTER FAILOVER GROUP myfg PRIMARY;
Resume scheduled replication in target accounts¶
On failover, scheduled refreshes on all secondary failover groups are suspended. ALTER FAILOVER GROUP … RESUME must be executed in each target account with a secondary failover group to resume automatic refreshes.
ALTER FAILOVER GROUP myfg RESUME;
View the current phase of an in-progress refresh operation¶
A refresh operation can be safely canceled during most phases of the refresh operation. However, canceling a refresh operation in the SECONDARY_DOWNLOADING_METADATA or SECONDARY_DOWNLOADING_DATA phase might result in an inconsistent state on the target account. If the refresh operation has started one of these phases, it proceeds to completion regardless of the availability of the source account. Allowing the phase to complete before you fail over ensures replicas are in a consistent state. After the replicas are in a consistent state, you can resume or replay your ingest and transformation pipelines to update the replicas to the current state.
To view the current phase of an in-progress refresh operation for a failover group, use the Information Schema REPLICATION_GROUP_REFRESH_PROGRESS, REPLICATION_GROUP_REFRESH_PROGRESS_BY_JOB table function.
For example, to view the current phase of an in-progress refresh operation for failover group myfg
, execute
the following statement:
SELECT phase_name, start_time, end_time
FROM TABLE(
INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_PROGRESS('myfg')
);
For a list of refresh operations phases, see the usage notes for the function.
Cancel an in-progress refresh operation that wasn’t automatically scheduled¶
To cancel an in-progress refresh operation that was not triggered automatically by a replication schedule, you must use the SYSTEM$CANCEL_QUERY function:
Find the query ID or JOB_UUID for running refresh operations using one of the following options:
Find the query IDs for all running refresh operations:
SELECT query_id, query_text FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY()) WHERE query_type = 'REFRESH REPLICATION GROUP' AND execution_status = 'RUNNING' ORDER BY start_time;
Use the QUERY_TEXT column to identify the QUERY_ID for failover group refresh operations from the list.
Find the JOB_UUID for an in-progress refresh operation for a specific failover group
myfg
:SELECT phase_name, start_time, job_uuid FROM TABLE(INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_HISTORY('myfg')) WHERE phase_name <> 'COMPLETED' and phase_name <> 'CANCELED';
Cancel the refresh operation using the SYSTEM$CANCEL_QUERY function and the QUERY_ID or JOB_UUID:
SELECT SYSTEM$CANCEL_QUERY('<QUERY_ID | JOB_UUID>');
Returns the following output:
query [<QUERY_ID>] terminated.
After you cancel the in-progress refresh operation, continue to the next steps.
Reopen active channels for Snowpipe Streaming in newly promoted source account¶
Tables in a primary database that are populated by Snowpipe Streaming are replicated to secondary databases. After failover, reopen active Snowpipe Streaming channels for tables and re-insert any missing data rows for the channels:
Reopen active channels for the table by calling the openChannel API.
Fetch offset tokens:
Call the getLatestCommittedOffsetToken API or
Execute the SHOW CHANNELS command to retrieve a list of the active channels of the table.
Re-insert data rows for the channel from the fetched offset tokens.
Note
These steps apply only to Snowpipe Streaming with the Snowflake Ingest SDK; it doesn’t apply to Snowpipe Streaming with the Kafka connector. Follow the steps below to restart the Kafka Connector after failover.
Snowpipe Streaming and the Kafka connector¶
If you are using the Kafka connector and Snowpipe Streaming, follow these steps after failover:
Update the Kafka connector configuration to point to the newly promoted source account.
Execute the SHOW CHANNELS command to retrieve the list of active channels and the offset tokens. Each channel belongs to a single partition in the Kafka topic.
Manually reset offsets in the Kafka Topic for each of those partitions (channels).
Restart the Kafka Connector.
For more information, refer to: