Setting Up the Openflow Connector for Google BigQuery¶
Note
This connector is subject to the Snowflake Connector Terms.
This topic describes the steps to set up the Openflow Connector for Google BigQuery.
Prerequisites¶
Set up your runtime deployment.
If you are using Openflow - Snowflake Deployments, ensure that you have reviewed configuring required domains and have granted access to the domains required by the connector.
You have access to the Openflow admin role or similar role you use to manage Openflow.
If you are creating a Snowflake service user to manage the connector, you have created a key pair authentication. For more information, see key-pair authentication.
Required endpoints¶
The following endpoints are required for the connector to function:
bigquery.googleapis.com:443bigquerystorage.googleapis.com:443oauth2.googleapis.com:443
If you are using Openflow - BYOC, you need to configure your cloud network egress to allow TLS 443 access to the endpoints listed above. If you are using Openflow - Snowflake Deployments, you need to create a network rule and an external access integration (EAI). Then, grant the Snowflake Role usage privileges on the EAI.
Set up BigQuery¶
Create a Google Cloud Service account and grant it the necessary permissions to read BigQuery data. The connector uses this account for authentication.
This account must have the following permissions:
Important
BigQuery Data Editormust be granted at the project level, not at individual datasets. The connector queries{project}.{region}.INFORMATION_SCHEMA.TABLESto discover tables across all configured regions - a region-scoped view that requires project-level access. The connector also queries{project}.{dataset}.INFORMATION_SCHEMA.KEY_COLUMN_USAGEto determine primary keys for each replicated table. Without project-level access, the query fails with aAccess Deniederror and the connector does not run correctly.
Generate and download the corresponding JSON key file for the service account. You will need the full contents of this file for the connector’s configuration.
Enable change history on each source table to allow the connector to perform incremental replication. This feature allows BigQuery to track row-level changes (inserts, updates, and deletes), which the connector uses to sync data efficiently.
Run the following query in the BigQuery console for each table:
Set up your Snowflake account¶
As an Openflow administrator, perform the following tasks to set up your Snowflake account:
Create a Snowflake service user:
Store the private key for that user in a file to supply to the connector’s configuration. For more information, see key-pair authentication.
Create a database that stores the replicated data, and set up permissions for the Snowflake user to create objects in that database by granting USAGE and CREATE SCHEMA privileges.
Create a new warehouse or use an existing warehouse for the connector.
To create a new warehouse:
Start with the MEDIUM warehouse size, then experiment with size depending on the amount of tables being replicated, and the amount of data transferred.
To determine if you should increase, monitor the connector and database while data replication is in progress. If you observe significant delays during incremental replication, experiment with a larger warehouse size. However large table numbers typically scale better using multi-cluster warehouses instead of increasing the warehouse size.
Create an external access integration to enable network access outside of Snowflake.
Caution
If your runtime executes in Openflow - BYOC, you do not need to create an External Access Integration (EAI). Instead, configure your cloud network egress to allow TLS 443 access to the endpoints listed below.
Required host:port endpoints are listed in Required endpoints.
To allow the connector to call the required Google APIs from a Snowflake-hosted runtime, you must create a network rule and an external access integration (EAI). Then, grant the Snowflake role usage privileges on the EAI.
To create the external access integration and network rule and grant access, perform the following steps:
Create a network rule to allow the connector to access the required Google APIs:
Create an External Access Integration that references the network rule:
Grant your Snowflake Role USAGE on the integration:
Install the connector¶
To install the connector, do the following as a data engineer:
Navigate to the Openflow overview page. In the Featured connectors section, select View more connectors.
On the Openflow connectors page, find the connector and select Add to runtime.
In the Select runtime dialog, select your runtime from the Available runtimes drop-down list and click Add.
Note
Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.
Authenticate to the deployment with your Snowflake account credentials and select Allow when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.
Authenticate to the runtime with your Snowflake account credentials.
The Openflow canvas appears with the connector process group added to it.
Configure the connector¶
To configure the connector, perform the following steps:
Right-click on the added runtime and select Parameters.
Populate the required parameter values as described in Specify flow parameters.
Specify flow parameters¶
This section describes the flow parameters that you can configure based on the following parameter contexts:
BigQuery Source Parameters: Used to define the configuration for reading data from BigQuery.
BigQuery Destination Parameters: Used to establish connection with Snowflake.
BigQuery Ingestion Parameters: Used to specify the tables and views to replicate.
BigQuery Source Parameters¶
Parameter |
Description |
|---|---|
BigQuery Project Name |
The unique identifier of the Google Cloud Project that contains BigQuery datasets and tables. Where to find: open BigQuery Studio (Google Cloud Console > BigQuery) and in the left Explorer pane hover over your project to see the Project ID. Example: |
GCP Service Account JSON |
The entire content of the JSON key file for the Google Cloud Platform Service Account used for authentication. Ensure the service account has the necessary IAM permissions to perform BigQuery operations, such as the BigQuery Job User and BigQuery Data Viewer roles. Where to get it: Google Cloud Console > IAM & Admin > Service Accounts > select the service account > Keys tab > Add key > Create new key > JSON. This downloads a .json file—open it and paste the entire file content (including braces) into this field. |
BigQuery Destination Parameters¶
Parameter |
Description |
|---|---|
Snowflake Authentication Strategy |
When using SPCS, use SNOWFLAKE_SESSION_TOKEN as the value for Authentication Strategy. When using BYOC, use KEY_PAIR as the value for Authentication Strategy. Example: |
Snowflake Account Identifier |
When using:
|
Destination Database |
The name of the destination database to replicate into. Mixed case is supported. |
Snowflake Private Key File |
When using:
|
Snowflake Private Key Password |
When using:
|
Snowflake Role |
When using:
|
Snowflake Username |
When using:
|
Snowflake Warehouse |
The name of the warehouse to use by the connector. |
BigQuery Ingestion Parameters¶
Parameter |
Description |
|---|---|
BigQuery Regions |
Specifies a comma-separated list of the locations to query for BigQuery datasets. You can combine both regional and multi-regional locations in the same list. Example: |
Included Dataset Names |
Comma-separated list of datasets to replicate (queried across all selected regions). Example: |
Included Dataset Names Regex |
Regular expression for specifying dataset names to replicate (queried across all selected regions). Combined with the Included Dataset Names to include any matching dataset. Note: REGEXP expression should match Google’s RE2 syntax. Example: |
Included Table Names |
Comma-separated list of tables to replicate across datasets. Example: |
Included Table Names Regex |
Regular expression for specifying table names to replicate across datasets. Combined with the Included Table Names to include any matching table. Note: REGEXP expression should match Google’s RE2 syntax. Example: |
Included View Names |
Comma-separated list of views to replicate across datasets. Example: |
Included View Names Regex |
Regular expression for specifying view names to replicate across datasets. Combined with the Included View Names to include any matching view. Note: REGEXP expression should match Google’s RE2 syntax. Example: |
Incremental Sync Frequency |
How often the connector runs incremental synchronization for each table. Runs do not overlap if a cycle takes longer than the configured interval, the next run waits for the prior one to finish. Because BigQuery limits max size of window to 24h, schedule must be more frequent than this value. Example: |
View Sync Frequency |
How often the connector runs synchronization for each view. Runs do not overlap, if a cycle takes longer than the configured interval, the next run waits for the prior one to finish. View ingestion does not support CDC, only truncate and load. Example: |
Temporary Table Dataset |
Dataset in which necessary temporary tables are created, such as CDC journal tables or temporary tables for view ingestion. Snowflake recommends having a separate dataset for temporary tables and not using the ingested dataset for this purpose. Example: |
Run the flow¶
Right-click on the plane and select Enable all Controller Services.
Right-click on the imported process group and select Start. The connector starts the data ingestion.
Next steps¶
For information on tasks you can perform after installing the connector, see Use the connector
For information on monitoring the flow, see Monitor the flow