Introduction to Business Continuity & Disaster Recovery¶
This topic describes the main use cases for replication and failover across regions and cloud platforms. The Snowflake replication and failover/failback functionality is composed of the following features:
Collectively, these individual features are designed to support a number of different fundamental business continuity scenarios, including:
Planned failovers: For disaster recovery drills to test preparedness, and measure recovery point and time.
Unplanned failovers: In the case of an outage in a region or a cloud platform, promote secondary account objects and databases in another region or cloud platform to serve as read-write primary objects.
Migration: Move your Snowflake account to a different region or cloud platform without disrupting your business. For example, to maintain business continuity during mergers and acquisitions, or facilitate a change in cloud strategy.
Multiple readable secondaries: Account objects and databases can be replicated to multiple accounts in different regions and cloud platforms, mitigating the risk of multiple region or cloud platform outages.
In addition, Snowflake Secure Data Sharing and Database Replication enable sharing data securely across regions and cloud platforms.
Replication and Failover/Failback Features¶
Account Object Replication and Failover/Failback¶
Account Replication introduces two Snowflake objects, replication group and failover group. A replication group allows customers to specify what to replicate, where to replicate to, and how often. This means specifying which account objects to replicate, to which regions or cloud platforms, at customizable scheduled intervals. A failover group enables the replication and failover of the account objects in a group. The objects in a group are replicated with point-in-time consistency from a source account to one or more target accounts.
Account objects can include warehouses, users, and roles, along with databases and shares (see Replicated Objects for the full list of objects that can be included in a replication or failover group). Account objects can be grouped in one or multiple groups.
In the case of failover, account replication enables the failover of your account to a different region or cloud platform. Each replication and failover group has its own replication schedule, allowing you to set the frequency for replication at different intervals for different groups of objects. In the case of failover groups, it also enables failover of groups individually. You can choose to failover all failover groups, or only select failover groups.
Database Replication and Failover/Failback¶
Database Replication enables storing read-only replicas of an individual primary database in other Snowflake accounts in different regions or cloud platforms. Database replication is now part of account replication. Existing databases enabled for replication can be added to a replication or failover group after database replication is disabled. For more details, see Transitioning From Database Replication to Group-Based Replication.
Client Redirect provides a connection URL that can be used by Snowflake clients to connect to Snowflake. The connection URL can redirect Snowflake clients to a different Snowflake account as needed.
Business Continuity and Disaster Recovery¶
In the event of a massive outage (due to a network issue, software bug, etc.) that disrupts the cloud services in a given region, access to Snowflake will be unavailable until the source of the outage is resolved and services are restored. To ensure continued availability and data durability in such a scenario, replicate your critical account objects to another Snowflake account in your organization in a different region.
With asynchronous replication, secondary replicas typically lag behind the primary objects based on the replication schedule you configure. For example, if you choose to replicate a primary replication or failover group every 30 minutes, the secondary replica objects in the group will be at most 30 minutes behind the primary during an outage.
Depending on your business needs you could choose to:
Recover reads first to let client applications read data that is 30 minutes stale.
Recover writes first to reconcile the last 30 minutes of data on the new primary before opening up reads from client applications.
Recover both reads and writes simultaneously, i.e. open up reads from client applications on data that is 30 minutes stale as you reconcile the last 30 minutes of data on the new primary.
Normal Status: Region is Operational¶
Account Object Replication: Replicate the failover group(s) with critical account objects to one or more Snowflake accounts in regions different from that of the account that stores the primary (source) failover group(s). Refresh the failover group(s) frequently.
To prioritize both reads and writes, follow the steps in either of the following example scenarios. When an outage occurs in a region, choose to fail over both your critical failover group(s) and Snowflake client connections at the same time.
Reads Before Writes¶
When an outage in a region results in full or partial loss of Snowflake availability, this path allows you to redirect Snowflake clients to read-only replicas of account objects in critical failover group(s) first for minimal downtime. Choosing to operate in read-only mode is often desirable during short-term outages.
A longer-term outage combined with the need for the latest data necessitates read-write mode.
Client Redirect: Point the connection URL used by clients to a Snowflake account that stores your read-only replica (secondary) failover group(s).
Failover (When Needed): In the event of a longer-term outage, promote the secondary failover group(s) in the Snowflake account where your connection URL is pointing to serve as read-write primary failover group(s).
Writes Before Reads¶
When an outage in a region results in full or partial loss of Snowflake availability, this path allows you to recover failover group(s) with critical account objects and continue to process data first. This option is preferable for account administrators who want to fail over their databases and ETL (Extract, Transform, Load) processes first, and then choose to redirect Snowflake clients only when the data is current.
Failover: Promote the secondary failover group(s) with critical account objects in a different region to serve as the primary failover group(s), which allows writing to the account objects included in each failover group(s). Once the databases in the group(s) are writable, you can use your ETL processes to prioritze writes and reconcile data.
Client Redirect (When Needed): Point the connection URL used by clients to the Snowflake account that stores the new primary failover group(s).
Normal Status: Outage is Resolved¶
Replication: Refresh the failover group(s) in the Snowflake account in the region where the outage occurred.
Failback: Promote the failover group(s) in the Snowflake account where the outage occurred to again serve as the primary failover group(s).
Client Redirect: Point the connection URL used by clients to the Snowflake account in the region where the outage occurred.
Account migration is the one-time process of migrating (or transferring) the Snowflake objects and your stored data to an account in another region or on a different cloud platform. Typical reasons for migrating your account include a closer proximity to your user base or a preference for a different cloud platform based on your corporate strategy or co-location with other cloud assets (e.g. a data lake).
Account object replication supports the replication of account objects such as warehouses, users, and roles, along with databases and shares. See Replicated Objects for the complete list of replicated objects.
Account object replication and failover/failback requires Business Critical (or higher). Snowflake can temporarily waive this requirement for a one-time account migration.