Set up the Openflow Connector for Google Drive

Note

The connector is subject to the Connector Terms.

This topic describes the steps to set up the Openflow Connector for Google Drive.

Prerequisites

  1. Ensure that you have reviewed Openflow Connector for Google Drive.

  2. Ensure that you have set up Openflow.

Get the credentials

Setting up the connector requires specific permissions and account settings for Snowflake Openflow processors to read data from Google. This access is provided in part through setting up a service account and a key for Openflow to authenticate as that service account. For more information, see:

As a Google Drive administrator, perform the following steps:

Prerequisites

Ensure that you meet the following requirements:

  • You have a Google user with Super Admin permissions

  • You have a Google Cloud Project with the following roles:

    • Organization Policy Administrator

    • Organization Administrator

Enable service account key creation

By default Google disables service account key creation. For Openflow to use the service account JSON, this key creation policy must be turned off.

  1. Log in to the Google Cloud Console with a super admin account that has the Organizational Policy Admin Role.

  2. Ensure you are in the project associated with your organization, not the project in your organization.

  3. Click Organization Policies.

  4. Select the Disable service account key creation policy.

  5. Click Manage Policy and turn off enforcement.

  6. Click Set Policy.

Create service account and key

  1. Open the Google Cloud Console and authenticate using a user that has been granted access to create service accounts.

  2. Ensure you are in a project of your organization.

  3. In the left navigation, under the IAM & Admin, select the Service Accounts tab.

  4. Click Create Service Account.

  5. Enter the service account name and click Create and Continue.

  6. Click Done. In the table with the service accounts listed, find the OAuth 2 Client ID column. Copy the Client ID as this will be required later to set up domain-wide delegation in the next section.

  7. On the newly created service account, click the menu under the table with the service accounts listed for that service account and select Manage keys.

  8. Select Add key and then Create new key.

  9. Leave the default selection of JSON and click Create.

The key is downloaded into your browser Downloads directory as a .json file.

Grant service account domain-wide delegation for listed scopes

  1. Log in to your Google Admin account.

  2. Select Admin from Google Apps selector.

  3. In the left navigation, expand Security and then Access and select Data control then click on API Controls.

  4. On the API Controls screen, select Manage domain wild delegation.

  5. Click Add new.

  6. Enter the OAuth 2 Client ID taken from the Create Service Account and Key section and the following scopes:

  7. Click Authorize.

Set up Snowflake account

As a Snowflake account administrator, perform the following tasks manually or by using the script included below:

  1. Create a new role or use an existing role and grant the Database privileges.

  2. Create a new Snowflake service user with the type as SERVICE.

  3. Grant the Snowflake service user the role you created in the previous steps.

  4. Configure with key-pair auth for the Snowflake SERVICE user from step 2.

  5. Snowflake strongly recommends this step. Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store.

    Note

    If for any reason, you do not wish to use a secrets manager, then you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization.

    1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it’s recommended that you the EC2 instance role associated with Openflow as this way no other secrets have to be persisted.

    2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to Controller Settings » Parameter Provider and then fetch your parameter values.

    3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.

  6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1.

  7. Designate a warehouse for the connector to use. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with multi-cluster warehouses, rather than larger warehouse sizes.

Example setup

--The following script assumes you'll need to create all required roles, users, and objects.
--However, you may want to reuse some that are already in existence.

--Create a Snowflake service user to manage the connector
USE ROLE USERADMIN;
CREATE USER <openflow_service_user> TYPE=SERVICE COMMENT='Service user for Openflow automation';

--Create a pair of secure keys (public and private). For more information, see
--key-pair authentication. Store the private key for the user in a file to supply
--to the connector’s configuration. Assign the public key to the Snowflake service user:
ALTER USER <openflow_service_user> SET RSA_PUBLIC_KEY = '<pubkey>';


--Create a role to manage the connector and the associated data and
--grant it to that user
USE ROLE SECURITYADMIN;
CREATE ROLE <openflow_connector_admin_role>;
GRANT ROLE <openflow_connector_admin_role> TO USER <openflow_service_user>;


--The following block is for USE CASE 2 (Cortex connect) ONLY
--Create a role for read access to the cortex search service created by this connector.
--This role should be granted to any role that will use the service
CREATE ROLE <cortex_search_service_read_only_role>;
GRANT ROLE <cortex_search_service_read_only_role> TO ROLE <whatever_roles_will_access_search_service>;

--Create the database the data will be stored in and grant usage to the roles created
USE ROLE ACCOUNTADMIN; --use whatever role you want to own your DB
CREATE DATABASE IF NOT EXISTS <destination_database>;
GRANT USAGE ON DATABASE <destination_database> TO ROLE <openflow_connector_admin_role>;

--Create the schema the data will be stored in and grant the necessary privileges
--on that schema to the connector admin role:
USE DATABASE <destination_database>;
CREATE SCHEMA IF NOT EXISTS <destination_schema>;
GRANT USAGE ON SCHEMA <destination_schema> TO ROLE <openflow_connector_admin_role>;
GRANT CREATE TABLE, CREATE DYNAMIC TABLE, CREATE STAGE, CREATE SEQUENCE, CREATE CORTEX
SEARCH SERVICE ON SCHEMA <destination_schema> TO ROLE <openflow_connector_admin_role>;

--The following block is for CASE 2 (Cortex connect) ONLY
--Grant the Cortex read-only role access to the database and schema
GRANT USAGE ON DATABASE <destination_database> TO ROLE <cortex_search_service_read_only_role>;
GRANT USAGE ON SCHEMA <destination_schema> TO ROLE <cortex_search_service_read_only_role>;

--Create the warehouse this connector will use if it doesn't already exist. Grant the
--appropriate privileges to the connector admin role. Adjust the size according to your needs.
CREATE WAREHOUSE <openflow_warehouse>
WITH
   WAREHOUSE_SIZE = 'MEDIUM'
   AUTO_SUSPEND = 300
   AUTO_RESUME = TRUE;
GRANT USAGE, OPERATE ON WAREHOUSE <openflow_warehouse> TO ROLE <openflow_connector_admin_role>;
Copy

Use case 1: Use the connector definition to ingest files only

Use the connector definition to:

  • Perform custom processing on ingested files

  • Ingest Google Drive files and permissions and keep them up to date

Set up the connector

As a data engineer, perform the following tasks to install and configure the connector:

Install the connector

  1. Navigate to the Openflow Overview page. In the Featured connectors section, select View more connectors.

  2. On the Openflow connectors page, find the connector and select Add to runtime.

  3. In the Select runtime dialog, select your runtime from the Available runtimes drop-down list.

  4. Select Add.

    Note

    Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.

  5. Authenticate to the deployment with your Snowflake account credentials and select Allow when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.

  6. Authenticate to the runtime with your Snowflake account credentials.

The Openflow canvas appears with the connector process group added to it.

Configure the connector

  1. Right-click on the imported process group and select Parameters.

  2. Enter the required parameter values as described in Google Drive Source Parameters, Google Drive Destination Parameters and Google Drive Ingestion Parameters.

Google Drive Source Parameters

Parameter

Description

Google Delegation User

The user that is used by the service account

GCP Service Account JSON

The service account JSON downloaded from Google Cloud Console to allow access to Google APIs in the connector

Google Drive Destination Parameters

Parameter

Description

Destination Database

The database where data will be persisted. It must already exist in Snowflake

Destination Schema

The schema where data will be persisted. It must already exist in Snowflake

Snowflake Account Identifier

Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted

Snowflake Authentication Strategy

Strategy of authentication to Snowflake. Possible values: SNOWFLAKE_SESSION_TOKEN - when we are running flow on SPCS, KEY_PAIR when we want to setup access using private key

Snowflake Private Key

The RSA private key used for authentication. The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either Snowflake Private Key File or Snowflake Private Key must be defined

Snowflake Private Key File

The file that contains the RSA Private Key used for authentication to Snowflake, formatted according to PKCS8 standards and having standard PEM headers and footers. The header line starts with -----BEGIN PRIVATE. Select the Reference asset checkbox to upload the private key file.

Snowflake Private Key Password

The password associated with the Snowflake Private Key File

Snowflake Role

Snowflake Role used during query execution

Snowflake Username

User name used to connect to Snowflake instance

Snowflake Warehouse

Snowflake warehouse used to run queries

Google Drive Ingestion Parameters

Parameter

Description

Google Drive ID

The Google Shared Drive to watch for content and updates

Google Folder Name

Optionally, the Google Drive folder identifier (human readable folder name) can be set to filter incoming files by. If all file types are desired then select “Set Empty String”. When set, only files that are in the provided folder or subfolder will be retrieved. When blank or unset, no folder filtering is applied and all files under the drive are retrieved.

Google Domain

The Google Workspace Domain that the Google Groups and Drive resides in.

File Extensions To Ingest

A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files.

Snowflake File Hash Table Name

Internal table used to store file content hashes to prevent updates to content when it has not changed.

  1. Right-click on the plane and select Enable all Controller Services.

  2. Right-click on the imported process group and select Start. The connector starts the data ingestion.

Use case 2: Use the connector definition to ingest files and perform processing with Cortex

Use the predefined flow definition to:

  • Create AI assistants for public documents within your organization’s Google Drive.

  • Enable your AI assistants to adhere to access controls specified in your organization’s Google Drive.

Set up the connector

As a data engineer, perform the following tasks to install and configure the connector:

Install the connector

  1. Navigate to the Openflow Overview page. In the Featured connectors section, select View more connectors.

  2. On the Openflow connectors page, find the connector and select Add to runtime.

  3. In the Select runtime dialog, select your runtime from the Available runtimes drop-down list.

  4. Select Add.

    Note

    Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.

  5. Authenticate to the deployment with your Snowflake account credentials and select Allow when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.

  6. Authenticate to the runtime with your Snowflake account credentials.

The Openflow canvas appears with the connector process group added to it.

Configure the connector

  1. Right-click on the imported process group and select Parameters.

  2. Enter the required parameter values as described in Google Drive Cortex Connect Source Parameters, Google Drive Cortex Connect Destination Parameters and Google Drive Cortex Connect Ingestion Parameters.

Google Drive Cortex Connect Source Parameters

Parameter

Description

Google Delegation User

The user that is used by the service account

GCP Service Account JSON

The service account JSON downloaded from Google Cloud Console to allow access to Google APIs in the connector

Google Drive Cortex Connect Destination Parameters

Parameter

Description

Destination Database

The database where data will be persisted. It must already exist in Snowflake

Destination Schema

The schema where data will be persisted. It must already exist in Snowflake

Snowflake Account Identifier

Snowflake account name formatted as [organization-name]-[account-name] where data will be persisted

Snowflake Authentication Strategy

Strategy of authentication to Snowflake. Possible values: SNOWFLAKE_SESSION_TOKEN - when we are running flow on SPCS, KEY_PAIR when we want to setup access using private key

Snowflake Private Key

The RSA private key used for authentication. The RSA key must be formatted according to PKCS8 standards and have standard PEM headers and footers. Note that either Snowflake Private Key File or Snowflake Private Key must be defined

Snowflake Private Key File

The file that contains the RSA Private Key used for authentication to Snowflake, formatted according to PKCS8 standards and having standard PEM headers and footers. The header line starts with -----BEGIN PRIVATE. Select the Reference asset checkbox to upload the private key file.

Snowflake Private Key Password

The password associated with the Snowflake Private Key File

Snowflake Role

Snowflake Role used during query execution

Snowflake Username

User name used to connect to Snowflake instance

Snowflake Warehouse

Snowflake warehouse used to run queries

Google Drive Cortex Connect Ingestion Parameters

Parameter

Description

Google Drive ID

The Google Shared Drive to watch for content and updates

Google Folder Name

Optionally, the Google Drive folder identifier (human readable folder name) can be set to filter incoming files by. If all file types are desired then select “Set Empty String”.

When set, only files that are in the provided folder or subfolder will be retrieved. When blank or unset, no folder filtering is applied and all files under the drive are retrieved.

Google Domain

The Google Workspace Domain that the Google Groups and Drive resides in.

OCR Mode

The OCR mode to use when parsing files with Cortex PARSE_DOCUMENT function. The value can be OCR or LAYOUT.

File Extensions To Ingest

A comma-separated list that specifies file extensions to ingest. The connector tries to convert the files to PDF format first, if possible. Nonetheless, the extension check is performed on the original file extension. If some of the specified file extensions are not supported by Cortex Parse Document, then the connector ignores those files, logs a warning message in an event log, and continues processing other files.

Snowflake File Hash Table Name

Internal table used to store file content hashes to prevent updates to content when it has not changed.

Snowflake Cortex Search Service User Role

An identifier of a role that is assigned usage permissions on the Cortex Search service.

  1. Right-click on the plane and select Enable all Controller Services.

  2. Right-click on the imported process group and select Start. The connector starts the data ingestion.

  3. Query the Cortex Search service.

Use case 3: Customise the connector definition

Customize the connector definition to:

  • Process the ingested files with Document AI.

  • Perform custom processing on ingested files.

Set up the connector

As a data engineer, perform the following tasks to install and configure the connector:

Install the connector

  1. Navigate to the Openflow Overview page. In the Featured connectors section, select View more connectors.

  2. On the Openflow connectors page, find the connector and select Add to runtime.

  3. In the Select runtime dialog, select your runtime from the Available runtimes drop-down list.

  4. Select Add.

    Note

    Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.

  5. Authenticate to the deployment with your Snowflake account credentials and select Allow when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.

  6. Authenticate to the runtime with your Snowflake account credentials.

The Openflow canvas appears with the connector process group added to it.

Configure the connector

  1. Customize the connector definition.

    1. Remove the following process groups:

      • Check If Duplicate Content

      • Snowflake Stage and Parse PDF

      • Update Snowflake Cortex

    2. Attach any custom processing to the output of the Process Google Drive Metadata process group. Each flow file represents a single Google Drive file change. Flow file attributes can be seen in the Fetch Google Drive Metadata documentation.

  2. Populate the process group parameters. Follow the same process as for Use case 1: Use the connector definition to ingest files only. Note that after modifying the connector definition, not all parameters might be required.

Run the flow

  1. Run the flow.

    1. Start the process group. The flow will create all required objects inside of Snowflake.

    2. Right click on the imported process group and select Start.

  2. Query the Cortex Search service.

Query the Cortex Search service

You can use the Cortex Search service to build chat and search applications to chat with or query your documents in Google Drive.

After you install and configure the connector and it begins ingesting content from Google Drive, you can query the Cortex Search service. For more information about using Cortex Search, see Query a Cortex Search service.

Filter responses

To restrict responses from the Cortex Search service to documents that a specific user has access to in Google Drive, you can specify a filter containing the user ID or email address of the user when you query Cortex Search. For example, filter.@contains.user_ids or filter.@contains.user_emails. The name of the Cortex Search service created by the connector is search_service in the schema Cortex.

Run the following SQL code in a SQL worksheet to query the Cortex Search service with files ingested from your Google Drive.

Replace the following:

  • application_instance_name: Name of your database and connector application instance.

  • user_emailID: Email ID of the user who you want to filter the responses for.

  • your_question: The question that you want to get responses for.

  • number_of_results: Maximum number of results to return in the response. The maximum value is 1000 and the default value is 10.

SELECT PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
    '<application_instance_name>.cortex.search_service',
      '{
        "query": "<your_question>",
         "columns": ["chunk", "web_url"],
         "filter": {"@contains": {"user_emails": "<user_emailID>"} },
         "limit": <number_of_results>
       }'
   )
)['results'] AS results
Copy

Here’s a complete list of values that you can enter for columns:

Column name

Type

Description

full_name

String

A full path to the file from the Google Drive documents root. Example: folder_1/folder_2/file_name.pdf.

web_url

String

A URL that displays an original Google Drive file in a browser.

last_modified_date_time

String

Date and time when the item was most recently modified.

chunk

String

A piece of text from the document that matched the Cortex Search query.

user_ids

Array

An array of Microsoft 365 user IDs that have access to the document. It also includes user IDs from all the Microsoft 365 groups that are assigned to the document. To find a specific user ID, see Get a user.

user_emails

Array

An array of Microsoft 365 user email IDs that have access to the document. It also includes user email IDs from all the Microsoft 365 groups that are assigned to the document.

Example: Query an AI assistant for human resources (HR) information

You can use Cortex Search to query an AI assistant for employees to chat with the latest versions of HR information, such as onboarding, code of conduct, team processes, and organization policies. Using response filters, you can also allow HR team members to query employee contracts while adhering to access controls configured in Google Drive.

Run the following in a SQL worksheet to query the Cortex Search service with files ingested from Google Drive. Select the database as your application instance name and schema as Cortex.

Replace the following:

  • application_instance_name: Name of your database and connector application instance.

  • user_emailID: Email ID of the user who you want to filter the responses for.

SELECT PARSE_JSON(
     SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
          '<application_instance_name>.cortex.search_service',
          '{
             "query": "What is my vacation carry over policy?",
             "columns": ["chunk", "web_url"],
             "filter": {"@contains": {"user_emails": "<user_emailID>"} },
             "limit": 1
          }'
     )
 )['results'] AS results
Copy