About Openflow Connector for SQL Server (CDC)¶
Note
This connector is subject to the Snowflake Connector Terms.
This topic describes the basic concepts, workflow, and limitations of the Openflow Connector for SQL Server (CDC).
About the Openflow Connector for SQL Server (CDC)¶
The Openflow Connector for SQL Server (CDC) connects a SQL Server database instance to Snowflake and replicates data from selected tables in near real-time or on schedule. The connector uses SQL Server Change Data Capture (CDC) to detect and apply changes to replicated tables. Change data is recorded in journal tables alongside the current state of the replicated tables.
Use cases¶
Use this connector if you’re looking to do the following:
- Synchronizing SQL Server data with Snowflake for comprehensive, centralized reporting.
- Capturing every individual row-level change from the source database, including intermediate states between polling intervals.
Supported SQL Server versions¶
The following SQL Server database versions and platforms are supported:
- Microsoft SQL Server 2025
- Microsoft SQL Server 2022
- Microsoft SQL Server 2019
- Microsoft SQL Server 2017
- Microsoft SQL Server 2016 (SP1 or later, Enterprise / Standard / Developer edition)
- Azure SQL Database
- Azure SQL Managed Instance
- AWS RDS for SQL Server
- Google Cloud SQL for SQL Server
Note
The connector requires SQL Server Change Data Capture to be enabled on the source databases and tables. CDC isn’t available on SQL Server Express or SQL Server Web editions (including the Web edition on AWS RDS). SQL Server 2017 and later support CDC on the Enterprise, Standard, and Developer editions. On SQL Server 2016, CDC on the Standard edition requires SP1 or later; builds before 2016 SP1 require the Enterprise or Developer edition.
Openflow requirements¶
- The runtime size must be at least Medium. For guidance on choosing a size and resizing the runtime later, see Runtime sizing.
- The connector doesn’t support multi-node Openflow runtimes. Configure the runtime for this connector
with Min nodes and Max nodes set to
1.
Limitations¶
- You can’t run multiple connectors of the same type in a single runtime instance.
- The connector supports only username and password authentication with SQL Server.
- The connector only replicates database tables that contain primary keys.
- The connector doesn’t update existing records in the Snowflake database when a new NOT NULL column with a default value is added to one of the source databases.
- The connector doesn’t update existing records in the Snowflake database when a new column is added to the included list in the Column Filter JSON.
- The connector supports source table schema changes, except for changes to a table’s primary key or to the precision or scale of a numeric column. For more information about how the connector applies schema changes, see Schema changes.
- When a new column is added to a source table and an update changes only that newly added column, SQL Server Change Data Capture doesn’t record those updates in the change table, so the connector can’t replicate them as individual change events. This is a SQL Server CDC limitation: a change is captured only when an update modifies at least one column that the table’s capture instance already tracks. The connector still reflects the latest value of the new column in the destination table after the connector switches to the new capture instance.
- The connector doesn’t support the truncate table operation.
Note
You can bypass limitations affecting certain table columns by excluding these specific columns from replication.
Workflow¶
The following workflow outlines the steps to set up and run the Openflow Connector for SQL Server (CDC):
-
A SQL Server database administrator performs the following tasks:
- Enables Change Data Capture on each database using
sys.sp_cdc_enable_db, then creates a capture instance for each table to be replicated usingsys.sp_cdc_enable_table. - Creates credentials for the connector.
- Deploys the Openflow CDC wrapper procedures so the connector can manage capture instances and apply source schema changes autonomously.
- (Optional) Provides the SSL certificate to connect to the SQL Server instance over SSL.
- Enables Change Data Capture on each database using
-
A Snowflake account administrator performs the following tasks:
- Creates a service user for the connector, a destination database to store replicated data, and a warehouse for the connector.
- Installs the connector.
- Specifies the required parameters for the connector flow definition.
- Runs the flow.
The connector does the following when run in Openflow:
-
Creates the schemas and destination tables matching the source tables configured for replication.
-
Begins replication according to the table replication lifecycle.
For more information, see How tables are replicated.
How the connector works¶
The following sections describe how the connector works in various scenarios, including replication, changes in schema, and data retention.
Change Data Capture behavior¶
The connector uses SQL Server Change Data Capture (CDC) to detect changes in the source tables. CDC captures row-level insert, update, and delete activity from the SQL Server transaction log into dedicated change tables.
Because CDC preserves every individual change, the connector is suitable for both data synchronization use cases and audit or history use cases where every change to a row must be captured. For a comparison with the Openflow Connector for SQL Server, which uses Change Tracking instead, see Comparison of Openflow connectors for SQL Server.
Data replication¶
The connector supports replicating tables from multiple SQL Server databases in a single SQL Server instance. The connector creates replicated tables from different databases in separate schemas in the destination Snowflake database.
Reference replicated tables by combining the source database name, the source schema name, and the table name in the following format:
<database_name>.<schema_name>.<table_name>
For each schema in each source database being replicated, the connector creates a separate schema in the destination Snowflake database.
The name of the destination schema is a combination of the source database name and the source schema name, separated by an underscore character (_) as shown in the following example:
<source_database_name>_<source_schema_name>
The connector creates tables in the destination schema with the same name as the source table name as shown in the following example:
<destination_database>.<destination_schema_name>.<source_table_name>
How tables are replicated¶
The connector replicates tables in the following stages:
- Schema introspection: The connector discovers the columns in the source table, including the column names and types, then validates them against Snowflake’s and the connector’s limitations. Validation failures cause this stage to fail, and the cycle completes. After successful completion of this stage, the connector creates an empty destination table.
- Snapshot load: The connector copies all data available in the source table into the destination table. If this stage fails, the connector stops replicating data. After successful completion, the data from the source table is available in the destination table.
- Incremental load: The connector reads new entries from the CDC change tables and applies those changes to the destination table. This process continues until the table is removed from replication. Failure at this stage permanently stops replication of the source table, until the issue is resolved.
For information on bypassing snapshot load and using the incremental load process, see Incremental replication.
Schema changes¶
The connector applies supported source table schema changes (DDL) automatically, without stopping replication or requiring a manual re-snapshot of the table. To do this, the connector manages SQL Server capture instances through the wrapper procedures deployed during setup. For more information, see Deploy the Openflow CDC wrapper procedures.
When a column is added on the source, the connector starts replicating it. Existing rows in the destination table aren’t backfilled with values for the new column. When a column is dropped on the source, the connector stops replicating it. Changes to a table’s primary key, or to the precision or scale of a numeric column, aren’t supported. For more information, see Limitations.
How the schema transition works¶
A SQL Server capture instance records changes against a fixed set of columns that’s frozen when the capture instance is created. When the schema of a source table changes, the connector can’t keep using the same capture instance, so it transitions to a new capture instance that reflects the updated schema. This transition happens in the following stages:
-
Schema change detection. The connector detects DDL changes through a polling mechanism: it periodically queries SQL Server’s
cdc.ddl_historytable for the tables it replicates. The polling interval is configurable and defaults to 30 seconds, so a schema change is picked up shortly after it occurs rather than in the same cycle. -
New capture instance creation. When the connector detects a schema change, it creates a new capture instance that reflects the post-change column set. The new capture instance starts recording changes from the point at which it’s created.
-
Draining the old capture instance. The connector keeps reading from the old capture instance up to the point where the schema changed, so that no change committed under the old schema is lost. These changes are emitted using the old column set. All changes committed under the old schema are delivered to the destination before any changes recorded under the new schema, so the destination always receives changes in schema-consistent commit order.
-
Mini-snapshot. A gap exists between the point where the schema changed and the point where the new capture instance started recording. To fill this gap, the connector replays the changes in that gap from the old capture instance, joining them against the live source table so that newly added columns carry their current values. This mini-snapshot brings the destination table up to date with the new schema without a full re-snapshot of the whole table.
-
Switching to the new capture instance. After the mini-snapshot completes, the connector retires (drops) the old capture instance and resumes normal replication from the new one.
Replaying already-sent changes¶
When the connector detects a schema change, it might need to rewind and replay change records that it already sent. For example, records sent before a new column was detected don’t include a value for that column, so the connector replays the affected rows to populate the new column. Because the connector applies changes idempotently by primary key, replaying these records doesn’t create duplicate rows in the destination table.
A schema change during a schema transition¶
A second schema change can occur while the connector is still transitioning to a new capture instance because of an earlier change. The connector detects the follow-up change, provisions a newer capture instance that reflects the latest schema, and continues the transition against that newer instance, rewinding as needed so the affected changes are replayed under the most recent schema. No data is lost, and the transition completes against the most recent schema.
Oversized values¶
The connector doesn’t replicate individual values larger than 16 MB. By default, processing such a value marks the associated table permanently failed. To change this behavior, modify the Oversized Value Strategy destination parameter.
Source database locking behavior¶
During the snapshot phase, the connector reads directly from the source tables to perform the initial full copy. Under SQL Server’s default READ COMMITTED isolation level, these read operations acquire shared locks on the source tables, which can lead to deadlocks if other database clients hold conflicting locks at the same time.
During incremental replication, the connector reads from dedicated CDC change tables, not from the source tables. This means incremental replication doesn’t acquire any locks on the source tables.
To avoid deadlocks during the snapshot phase, enable Read Committed Snapshot Isolation (RCSI) on the source database:
With RCSI enabled, read operations use row versioning instead of shared locks, which eliminates lock contention between the connector and concurrent write transactions on the source database.
Caution
Enabling RCSI changes the semantics of the default READ COMMITTED isolation level for all connections to the database, not just the connector. Applications that rely on the default lock-based READ COMMITTED behavior (for example, expecting readers to block on concurrent uncommitted writes) can see different results after the change. Test the impact in a non-production environment before enabling RCSI in production.
Understanding data retention¶
The connector follows a data retention philosophy where customer data is never automatically deleted. You maintain full ownership and control over your replicated data, and the connector preserves historical information rather than permanently removing it.
This approach has the following implications:
- Rows deleted from the source table are soft-deleted in the destination table rather than physically removed.
- Columns dropped from the source table are renamed in the destination table rather than dropped.
- Journal tables are retained indefinitely and are not automatically cleaned up.
Destination table metadata columns¶
Each destination table includes the following metadata columns that track replication information:
| Column name | Type | Description |
|---|---|---|
_SNOWFLAKE_INSERTED_AT | TIMESTAMP_NTZ | The timestamp when the row was originally inserted into the destination table. |
_SNOWFLAKE_UPDATED_AT | TIMESTAMP_NTZ | The timestamp when the row was last updated in the destination table. |
_SNOWFLAKE_DELETED | BOOLEAN | Indicates whether the row was deleted from the source table. When true, the row has been soft-deleted and no longer exists in the source. |
Soft-deleted rows¶
When a row is deleted from the source table, the connector does not physically remove it from the
destination table. Instead, the row is marked as deleted by setting the _SNOWFLAKE_DELETED metadata
column to true.
This approach allows you to:
- Retain historical data for auditing or compliance purposes.
- Query deleted records when needed.
- Decide when and how to permanently remove data based on your requirements.
To query only active (non-deleted) rows, filter on the _SNOWFLAKE_DELETED column:
To query deleted rows:
Dropped columns¶
When a column is dropped from the source table, the connector does not drop the corresponding column
from the destination table. Instead, the column is renamed by appending the __SNOWFLAKE_DELETED suffix
to preserve historical values.
For example, if a column named EMAIL is dropped from the source table, it is renamed to
EMAIL__SNOWFLAKE_DELETED in the destination table. Rows that existed before the column was dropped
retain their original values, while rows added after the drop have NULL in this column.
You can still query historical values from the renamed column:
Renamed columns¶
Due to limitations in CDC (Change Data Capture) mechanisms, the connector cannot distinguish between a column being renamed and a column being dropped followed by a new column being added. As a result, when you rename a column in the source table, the connector treats this as two separate operations: dropping the original column and adding a new column with the new name.
For example, if you rename a column from A to B in the source table, the destination table
will contain:
A__SNOWFLAKE_DELETED: Contains values from before the rename. Rows added after the rename haveNULLin this column.B: Contains values from after the rename. Rows that existed before the rename haveNULLin this column.
Querying renamed columns¶
To retrieve data from both the original and renamed columns as a single unified column, use a
COALESCE or CASE expression:
Alternatively, using a CASE expression:
Creating a view for renamed columns¶
Rather than manually modifying the destination table, you can create a view that presents the renamed column as a single unified column. This approach is recommended because it preserves the original data and avoids potential issues with ongoing replication.
Important
Manually modifying the destination table structure (such as dropping or renaming columns) is not recommended, as it may interfere with ongoing replication and cause data inconsistencies.
Journal tables¶
During incremental replication, changes from the source database are first written to journal tables before being merged into the destination tables. The connector does not automatically remove data from journal tables, as this data may be useful for auditing, debugging, or reprocessing purposes.
Journal tables are created in the same schema as their corresponding destination tables and follow this naming convention:
<TABLE_NAME>_JOURNAL_<timestamp>_<number>
Where:
<TABLE_NAME>is the name of the destination table.<timestamp>is the creation timestamp in Unix epoch format (seconds since January 1, 1970), ensuring uniqueness.<number>starts at 1 and increments whenever the destination table schema changes, either due to schema changes in the source table or modifications to column filters.
For example, if your destination table is SALES.ORDERS, the journal table might be named
SALES.ORDERS_JOURNAL_1705320000_1.
Important
Do not drop journal tables while replication is in progress. Removing an active journal table may cause data loss or replication failures. Only drop journal tables after the corresponding source table has been fully removed from replication.
Managing journal table storage¶
If you need to manage storage costs by removing old journal data, you can create a Snowflake task that periodically cleans up journal tables for tables that are no longer being replicated.
Before implementing journal cleanup, verify that:
- The corresponding source tables have been fully removed from replication.
- You no longer need the journal data for auditing or processing purposes.
For information on creating and managing tasks for automated cleanup, see Introduction to tasks.
Next steps¶
Review Openflow connectors for SQL Server: Data mapping to understand how the connector maps data types to Snowflake data types.
Review Set up the Openflow Connector for SQL Server (CDC) to set up the connector.