Introduction to Business Continuity & Disaster Recovery

This topic describes the main use cases for replicating and failing over data to a Snowflake account in another region, or even a different cloud platform.

The Snowflake replication and failover/failback functionality is composed of the following features:

Database Replication

Database Replication enables storing read-only replicas of a primary database in other Snowflake accounts. These accounts, which must be grouped in the same organization, can be located in different regions or cloud platforms. Refreshing each replica (secondary database) synchronizes the database objects and stored data with its primary database.

Database Failover/Failback

Database Failover/Failback promotes a replica to serve as the primary database. At that point, the former primary database becomes a read-only secondary database, and the former replica becomes the read-write primary database.

Client Redirect

Client Redirect provides a connection URL that can be used by Snowflake clients to connect to Snowflake. The connection URL can be redirected to a different Snowflake account as needed.

Collectively, these individual features are designed to support a number of different fundamental business scenarios, including:

  • Recovering from an outage in a cloud platform region, emphasizing database writes over reads.

  • Recovering from an outage in a cloud platform region, emphasizing database reads over writes.

  • Recovering from an outage in a cloud platform region, emphasizing both database reads and writes.

  • Migrating your databases from one cloud platform or region to another.

In addition, Snowflake Secure Data Sharing and Database Replication enable sharing data securely across regions and cloud platforms.

In this Topic:

Business Continuity and Disaster Recovery

In the event of a massive outage (due to a network issue, software bug, etc.) that disrupts the cloud services in a given region, access to Snowflake will be unavailable until the source of the outage is resolved and services are restored. To ensure continued availability and data durability in such a scenario, replicate your critical databases to another Snowflake account in your organization in a different region.

With asynchronous replication, secondary replicas typically lag behind the primary database based on the replication frequency you configure. For example, if you choose to replicate a primary database every 30 minutes, the secondary replica will be at most 30 minutes behind the primary during an outage.

Depending on your business needs you could choose to:

  • Recover database reads first to let client applications read data that is 30 minutes stale.

  • Recover database writes first to reconcile the last 30 minutes of data on the new primary before opening up reads from client applications.

  • Recover both database reads and writes simultaneously, i.e. open up reads from client applications on data that is 30 minutes stale as you reconcile the last 30 minutes of data on the new primary.

To prioritize both database reads and writes, follow the steps in either of the following scenarios. When an outage occurs in a region, choose to fail over both your critical databases and Snowflake client connections at the same time.

Database Reads Before Writes

When an outage in a region results in full or partial loss of Snowflake availability, this path allows you to redirect Snowflake clients to read-only replicas of critical databases first for minimal downtime. Choosing to operate in read-only mode is often desirable during short-term outages.

A longer-term outage combined with the need for the latest data necessitates read-write mode.

The steps for this path are as follows.

Normal Status: Region is Operational

  1. Database Replication: Replicate critical databases to one or more Snowflake accounts in regions different from that of the account that stores the primary (source) databases. Refresh the database objects and stored data frequently.

Region Outage

  1. Client Redirect: Point the connection URL used by clients to a Snowflake account that stores your read-only replica (secondary) databases.

  2. Database Failover (When Needed): In the event of a longer-term outage, promote the secondary databases in the Snowflake account where your connection URL is pointing to serve as read-write primary databases.

Normal Status: Outage is Resolved

  1. Database Replication: Refresh the databases in the Snowflake account in the region where the outage occurred.

  2. Database Failback: Promote the databases in the Snowflake account where the outage occurred to again serve as the primary databases.

  3. Client Redirect: Point the connection URL used by clients to the Snowflake account in the region where the outage occurred.

Database Writes Before Reads

When an outage in a region results in full or partial loss of Snowflake availability, this path allows you to recover critical databases and continue to process data first. This option is preferable for account administrators who want to fail over their databases and ETL (Extract, Transform, Load) processes first, and then choose to redirect Snowflake clients only when the data is current.

The steps for this path are as follows.

Normal Status: Region is Operational

  1. Database Replication: Replicate critical databases to one or more Snowflake accounts in regions different from that of the account that stores the primary (source) databases. Refresh the database objects and stored data frequently.

Region Outage

  1. Database Failover: Promote replicas of critical databases in a different region to serve as the primary databases, which allows writing to these databases. Once the databases are writable, you can use your ETL processes to prioritze writes and reconcile data.

  2. Client Redirect (When Needed): Point the connection URL used by clients to the Snowflake account that stores the new primary databases.

Normal Status: Outage is Resolved

  1. Database Replication: Refresh the databases in the Snowflake account in the region where the outage occurred.

  2. Database Failback: Promote the databases in the Snowflake account where the outage occurred to again serve as the primary databases.

  3. Client Redirect: Point the connection URL used by clients to the Snowflake account in the region where the outage occurred.

Business Continuity and Disaster Recovery Flows

The diagrams in this section show the flows for business continuity and disaster recovery.

Flow for Databases

  1. The following diagram shows two accounts in the same organization but different regions (Region A and Region B). In one account, a local database has been promoted to serve as a primary database. Replication has been enabled for the other account, allowing it to store a replica of the primary database (that is, a secondary database):

    Initial state of database replication
  2. The following diagram shows a secondary database that has been created in the account in Region B. The green arrow shows a data refresh operation in progress from the primary database to the secondary database:

    Data replication operation in progress
  3. The following diagram shows a failover scenario: A service outage in Region A, where the account that contains the primary database is located. The secondary database (in Region B) has been promoted to serve as the primary database. Concurrently, the former primary database has become a secondary, read-only database:

    Database failover

    The steps to fail over a database are described in Failing Over Databases Across Multiple Accounts.

  4. The following diagram shows that the service outage in Region A has been resolved. A database refresh operation is in progress from the primary database (in Region B) to the secondary database (in Region A):

    Database failover
  5. The final diagram shows operations returned to their initial configuration (i.e. failback). The secondary database (in Region A) has been promoted to once again serve as the primary database for normal business operations. Concurrently, the former primary database (in Region B) has become a secondary, read-only database:

    Database failover

Flow for Connections

  1. The following diagram shows two accounts in the same organization but different regions (Region A and Region B) on either the same or different cloud platforms. The connection URL for client connections is configured for an account in Region A:

    Normal client connections
  2. The following diagram shows a service outage in Region A that results in failed client connections:

    Failed client connections
  3. The following diagram shows that the connection URL for client connections is now configured for an account in Region B. Clients connecting to the connection URL are now connecting to the account in Region B.

    Note that Client Redirect is a manual process. See the instructions in Redirecting Client Connections for more information.

    Redirected client connections

Account Migration

Account migration is the one-time process of migrating (or transferring) the Snowflake objects and your stored data to an account in another region or on a different cloud platform. Typical reasons for migrating your account include a closer proximity to your user base or a preference for a different cloud platform based on your corporate strategy or co-location with other cloud assets (e.g. a data lake).

Currently, support for replicating Snowflake objects is limited to databases as containers for objects that store data (e.g. tables and views, including materialized views). See Replicated Database Objects for the complete list of replicated objects. Work with Snowflake Support to replicate account objects (e.g. users and roles) as well as other Snowflake objects to the new account.

Note

Database Failover/Failback requires Business Critical (or higher). Snowflake can temporarily waive this requirement for a one-time account migration.