About the Openflow Connector for MongoDB¶
Note
This connector is subject to the Snowflake Connector Terms.
This topic describes the basic concepts of the Openflow Connector for MongoDB, its workflow, and limitations.
The Openflow Connector for MongoDB connects a MongoDB database to Snowflake and replicates data from selected collections on a schedule. The connector performs an initial full load for each collection, followed by incremental updates using MongoDB change streams.
Use cases¶
The connector supports the following use cases:
- Replication to Snowflake
Continuously mirror collections from MongoDB into Snowflake for downstream analytics and modeling. Incremental changes arrive on a schedule with a delay window of a few minutes.
- Selective replication
Define which collections to include using names or regex filters for broad coverage with control.
- Migration and change capture
Perform a one-time snapshot load for migrations, then run incremental syncs using MongoDB change streams to keep collections in sync.
Limitations¶
The connector has the following limitations:
- Standalone MongoDB instances aren’t supported. The connector relies on the MongoDB oplog (operations log) to track changes. The MongoDB oplog is only available in a Replica Set or Sharded Cluster environment.
- The minimum supported version of MongoDB is version 4.4.
- The connector supports only username and password authentication with MongoDB.
Snowflake table structure¶
The connector maps MongoDB documents to the corresponding Snowflake table. The entire payload
of the document is stored in the data field.
| Snowflake column | Description |
|---|---|
| id | ID of the MongoDB document |
| data | The payload of the document |
Collection replication lifecycle¶
A collection’s replication cycle begins with an initial snapshot and transitions to incremental sync.
- Snowflake table creation: The connector creates a table in Snowflake. The structure of the table is the same for each collection. For more information, see Snowflake table structure.
- Snapshot load: After creating a table in Snowflake, the connector performs a full copy of all existing data from the MongoDB collection to the Snowflake table. This process runs sequentially for each collection in the configuration.
- Incremental sync: After the initial load is complete, the collection enters incremental sync mode. The connector listens to the MongoDB change stream to read the journal document-level changes (inserts, updates, deletes) that accrued in the collection. These changes are then merged into the destination table in Snowflake.
Openflow requirements¶
The runtime size must be at least Medium. Use Large and a multi-node Openflow setup for high-throughput workloads or when replicating large collections. If you observe processor backpressure or memory pressure, increase the runtime size or add nodes.
For information about creating a warehouse for the connector, see Designate a warehouse.
Workflow¶
The workflow for the Openflow Connector for MongoDB involves steps performed by the MongoDB administrator and the Snowflake administrator.
MongoDB administrator¶
The MongoDB administrator performs the following tasks:
-
Enable replication
The MongoDB administrator configures a replica set or sharded cluster.
-
Ensure the oplog size is sufficient
For high-volume data ingestion, the MongoDB administrator must ensure the
oplogSizeMBis sufficiently large to retain the history of changes during the connector or connectivity downtime. If the connector is offline for longer than the Oplog’s retention period, the full re-sync of data might be required. -
Create a database user
The MongoDB administrator creates a user with the necessary roles to monitor changes in the database. The user requires the
readAnyDatabaserole on theadmindatabase. -
Configure network access
The MongoDB administrator configures network access from MongoDB to the Openflow Runtime.
Snowflake administrator¶
The Snowflake administrator performs the following tasks:
-
Create a service user, a warehouse, and a destination database
The administrator creates the necessary Snowflake objects for the replicated data.
-
Import the connector definition file
The administrator imports the file into the Snowflake Openflow canvas.
-
Configure the flow
The administrator configures the flow with the necessary MongoDB and Snowflake parameters.
-
Run the flow
The administrator runs the flow.
Next steps¶
For information about configuring the source MongoDB database and the target Snowflake account, see Connect to MongoDB.