Openflow Connector for MySQL: Maintenance¶
Note
This connector is subject to the Snowflake Connector Terms.
This topic describes important maintenance considerations and best practices for maintaining the Openflow Connector for MySQL such as reinstalling the connector or setting the starting binary log position for loading.
These operations are often used in conjunction with Incremental replication without snapshots.
Check the replication status of a table¶
Interim failures, such as connection errors, do not prevent table replication. However, permanent failures, such as unsupported data types, prevent table replication.
To troubleshoot replication issues or verify that a table has been successfully removed from the replication flow, check the Table State Store:
- In the Openflow runtime canvas, right-click a processor group and choose Controller Services. A table listing controller services displays.
- Locate the row labeled Table State Store, click the More
button on the right side of the row, and then choose View State.
A list of tables and their current states displays. Type in the search box to filter the list by table name. The possible states are:
- NEW: The table is scheduled for replication but replication hasn’t started.
- SNAPSHOT_REPLICATION: The connector is copying existing data. This status displays until all records are stored in the destination table.
- INCREMENTAL_REPLICATION: The connector is actively replicating changes. This status displays after snapshot replication ends and continues to display indefinitely until a table is either removed from replication or replication fails.
- FAILED: Replication has permanently stopped due to an error.
Note
The Openflow runtime canvas doesn’t display table status changes — only the current table status. However, table status changes are recorded in logs when they occur. Look for the following log message:
If a permanent failure prevents table replication, remove the table from replication. After you address the problem that caused the failure, you can add the table back to replication. For more information, see Restart table replication.
Increase the oversized value limit¶
By default, the connector replicates individual values up to 16 MB and marks any table that contains a larger value as permanently failed. If your Snowflake account has ENABLE_OPENFLOW_CDC_MYSQL_SSV2 parameter set to true, the per-value limit can be raised from 16 MB to 128 MB.
Important
Snowflake enforces 128 MB as a hard per-row limit. The connector also adds metadata columns to every replicated row (_SNOWFLAKE_UPDATED_AT, _SNOWFLAKE_INSERTED_AT, _SNOWFLAKE_DELETED), which count toward the same 128 MB budget along with any other columns in the source row. Leave headroom when replicating columns whose values approach 128 MB.
The increased limit doesn’t apply equally to all column types.
Note
In Snowflake, the maximum size for BINARY is 64 MB (BINARY(67108864)), even when the increased size limits are enabled. Only VARCHAR / VARIANT / ARRAY / OBJECT columns can hold up to 128 MB.
Prerequisites¶
To verify the account ENABLE_OPENFLOW_CDC_MYSQL_SSV2 parameter value, run the following query.
If the returned value is not true, the connector continues to enforce the 16 MB limit regardless of any processor configuration.
Configure the processors¶
Update the Oversized Value Limit property to 128 MB on both of the following processors:
- Fetch Table Rows (in the Snapshot Load group)
- Read MySQL CDC Stream (in the Incremental Load group)
For each processor:
- Locate the processor in the flow.
- Right-click the processor and select Configure.
- Open the Properties tab.
- Set Oversized Value Limit to
128 MB. - Apply the change.
For tables that are already being replicated and have destination columns narrower than VARCHAR(134217728) or BINARY(67108864), see Migrate an existing table to a larger oversized value limit.
Migrate an existing table to a larger oversized value limit¶
The steps in Increase the oversized value limit raise the limit for newly created destination tables. If a table is already being replicated and its destination column type is not VARCHAR(134217728) or BINARY(67108864), but you now want to load values larger than the original 16 MB limit, you must manually widen the column type on both the journal and destination tables.
Before you migrate, check the current destination column type, because it can vary depending on when the snapshot replication was performed.
Warning
You must stop replication for the affected table before altering its journal or destination tables. Altering these tables while replication is active can corrupt in-flight data.
To migrate a table:
- Stop replication for the affected table by stopping the topmost processors of the Snapshot Load and Incremental Load groups until all queues are empty. For the equivalent stop procedure, see steps 1.g-1.j of Reinstall the connector.
- Widen the column on both the journal table and the destination table, according to the column type:
- VARCHAR columns: a single
ALTER TABLE ... ALTER COLUMN ... SET DATA TYPE VARCHAR(134217728)is sufficient on both the journal and destination tables. - BINARY columns: Snowflake doesn’t allow widening
BINARYin place. You must:- Add a new column of type
BINARY(67108864)on both the journal and destination tables. - Copy data from the original column into the new column.
- Drop the original column and rename the new column to the original name.
- Add a new column of type
- VARCHAR columns: a single
- Restart replication by re-enabling the processors.
Performance considerations¶
For snapshot replication, the product of fetchSize * rowSize * concurrentQueries can’t exceed the heap size of the NiFi runtime, where:
fetchSizeis the number of rows fetched per query, set on the Fetch Table Rows processor (default: 100).rowSizeis the size of a single row being fetched.concurrentQueriesis the number of concurrent queries, set on the Fetch Table Rows processor (default: 2).
For incremental replication, you might need a LARGE warehouse, depending on the number of changes to rows that contain oversized values.
Reinstall the connector¶
This section provides instructions on how to reinstall the connector, and continue replicating data for the same tables without having to snapshot them again. It covers situations where the new connector is installed in the same runtime, as well as moved to a new runtime.
Warning
For the connector to continue replicating from the same CDC stream position where it stopped before reinstallation,
the source database must retain the binary log long enough to cover the time since the prior connector was stopped
and the new connector is started.
Make sure the binlog_expire_logs_seconds parameter of the MySQL server is high enough, and keep the reinstallation time to a minimum.
The value of binlog_expire_logs_seconds needs to be longer than the expected time expected to reinstall the connector.
Typically 86400s, a day is seconds, is sufficient, however longer times might be appropriate to ensure time to reinstall.
Prerequisites¶
Review and note connector parameter context values. If you’re reinstalling the connector in the same runtime, you can reuse the existing context. If the new instance is located in a different runtime, you must re-enter all parameters.
-
Finish processing all in-flight FlowFiles in the existing connector, then stop the connector.
-
Sign in to Snowsight.
-
In the navigation menu, select Ingestion » Openflow.
-
Select Launch Openflow.
-
In the Openflow pane select the Runtimes tab.
-
Select the runtime containing the connector.
-
Select the connector.
-
Stop the topmost processor Set Tables for Replication in the Snapshot Load group.
-
Stop the topmost processor Read MySQL CDC Stream in the Incremental Load group.
-
If you changed the value of the Merge Task Schedule CRON parameter, return it to
* * * * * ?, otherwise queues won’t be emptied until the next scheduled run.Wait until all FlowFiles in the connector have been processed, and all queues are empty. When all FlowFiles have been processed, the Queued value on the connector’s processor group becomes zero. If there are any items left in the original connector’s queues, there may be data gaps when the new connector starts.
-
Stop all processors and controller services in the connector.
Caution
The existing connector can remain in the runtime and doesn’t interfere with the new instance, as long as it remains stopped.
-
-
Create a new instance of the connector. If you’re using the same runtime as the original connector, you can choose to keep the existing parameter contexts and reuse the settings.
-
If you’re installing into a different runtime or you deleted the previous parameter contexts, enter the configuration settings into the new parameter contexts, including the table names and patterns as described in Set up the Openflow Connector for MySQL.
-
Navigate to the
MySQL Ingestion Parameterscontext, and set the following parameters:- Set the
Ingestion Typeparameter toincremental. For more information on the concerns see Enable incremental replication without snapshots. - Set the
Starting Binlog Positionparameter toEarliest. For more information and potential concerns see Specify load from binary log position.
- Set the
-
Start the new connector.
Usage notes¶
The new connector uses the existing destination tables that were created by the original connector, but the connector creates new journal tables.
Specify load from binary log position¶
The Openflow Connector for MySQL connector allows you to select the starting position where MySQL binary logs are read. By default the connector reads from the latest available position. Alternatively, you can choose the earliest position available on the source instance. Choosing to start from the earliest position is common when reinstalling the connector. This allows the new instance to catch up and continue replicating existing tables without having to snapshot each again.
Note that switching a running connector from latest to earliest position cause the entire available binary log to be re-read, re-processed, and re-applied to the destination table.
Warning
While the binary log is being re-read, the columns and data in affected destination tables can become out of sync with their sources until all events have been re-processed and merged.
The following parameters control snapshot loads are available in the Ingestion Parameters context:
| Parameter | Description |
|---|---|
| Starting Binlog Position |
|
| Re-read Tables in State |
|
To determine whether the connector finished re-reading the binary log:
- Navigate to the Openflow canvas.
- Open the Incremental Load process group.
- Right-click the topmost processor named Read MySQL CDC Stream, then select View state.
- Compare the state entries:
- binlog.position.rewind: the latest position the processor read before re-reading of the binary log started.
- binlog.position.dml: the current latest position read by the processor. As long as this value is lower than the rewind value above, the processor is still re-reading the binary log.
Usage notes¶
- After a running connector is switched to read from the earliest position, and starts running, the process can’t be reconfigured or cancelled, and will continue until the currently-read position reaches the position from before it started.
- Switching to the earliest position on a running connector will, for any tables being re-processed, finish their existing journals, and create new journal tables.
- If the binary log contains events from a previous table that was dropped and re-created in the source database, the re-reading the stream re-processes all events in the current destination. The connector can’t distinguish between a previous and current source table if they share the same name.