Set up the Openflow Connector for Excel¶

Note

The connector is subject to the Connector Terms.

This topic describes the steps to set up the Openflow Connector for Excel.

Prerequisites¶

  1. Ensure that you have reviewed About Openflow Connector for Excel.

  2. Ensure that you have set up Openflow.

Get the credentials¶

This section describes the steps to get your credentials to configure the connector. Depending on your data source,

Get the AWS credentials¶

As an AWS administrator, perform the following tasks:

  1. Log in to your AWS IAM console.

  2. Select the number under Users, then select Create user.

  3. Specify the user name, group, and additional permissions if needed. The user must have at least s3:GetObject access to objects read by the connector from the S3 bucket.

  4. After the user is created, in the user’s view, navigate to Security Credentials » Access Keys.

  5. Select Create access key. The new access key must grant access only to specific resources. For better security and access control, Snowflake recommends to only allow access to specific S3 buckets.

  6. Take note of the Access Key and Secret Access Key.

Get the SharePoint credentials¶

The connector uses the following Microsoft Graph APIs to fetch data from SharePoint:

As a SharePoint administrator, perform the following actions:

  1. Log in to your Microsoft Entra admin center.

  2. Navigate to Applications » App registrations.

  3. Ensure that you have an application with the following MS Graph Application permissions. For more information, see Get access without a user.

    • For SharePoint Site access, one of the following permissions must be granted:

      • Sites.Read.All - allows read access for all sites.

      • Sites.Selected - limits access only to specified sites.

    • For SharePoint file access (for file downloads), one of the following permissions must be granted:

      • Files.Read.All - allows read access for all files.

      • Files.SelectedOperations.Selected - limits access only to files in specified sites.

    Note

    Snowflake recommends using the Selected permissions for better security and access control.

  4. Get the following credentials. You may need to contact your Azure or Office 365 account administrator to get this information:

    • The site URL of your Microsoft 365 SharePoint site with files or folders that you want to ingest into Snowflake.

    • Your tenant ID. To learn about tenant ID and how to find it in Microsoft Entra, see Find your Microsoft 365 tenant ID.

    • Client ID and client secret for your Microsoft Entra application.

Set up Snowflake account¶

As a Snowflake account administrator, perform the following tasks:

  1. Create a new role or use an existing role.

  2. Create a new Snowflake service user with the type as SERVICE.

  3. Grant the Snowflake service user the role you created in the previous steps.

  4. Configure with key-pair auth for the Snowflake SERVICE user from step 2.

  5. Strongly recommended Configure a secrets manager supported by Openflow, for example, AWS, Azure, and Hashicorp, and store the public and private keys in the secret store.

    Note

    If you do not want to use a secrets manager, you are responsible for safeguarding the public key and private key files used for key-pair authentication according to the security policies of your organization.

    1. Once the secrets manager is configured, determine how you will authenticate to it. On AWS, it’s recommended that you use the EC2 instance role associated with Openflow so that no other secrets have to be persisted.

    2. In Openflow, configure a Parameter Provider associated with this Secrets Manager, from the hamburger menu in the upper right. Navigate to Controller Settings » Parameter Provider and fetch your parameter values.

    3. At this point all credentials can be referenced with the associated parameter paths and no sensitive values need to be persisted within Openflow.

  6. If any other Snowflake users require access to the raw ingested documents and tables ingested by the connector (for example, for custom processing in Snowflake), then grant those users the role created in step 1.

  7. Create a database and schema in Snowflake for the connector to store ingested data. Grant required Database privileges to the role created in the first step. Substitute the role placeholder with the actual value and use the following SQL commands:

    CREATE DATABASE excel_destination_db;
    CREATE SCHEMA excel_destination_db.excel_destination_schema;
    GRANT USAGE ON DATABASE excel_destination_db TO ROLE <excel_connector_role>;
    GRANT USAGE ON SCHEMA excel_destination_db.excel_destination_schema TO ROLE <excel_connector_role>;
    GRANT CREATE TABLE ON SCHEMA excel_destination_db.excel_destination_schema TO ROLE <excel_connector_role>;
    
    Copy
  8. Create a warehouse which will be used by the connector or use an existing one. Start with the smallest warehouse size, then experiment with size depending on the number of tables being replicated, and the amount of data transferred. Large table numbers typically scale better with multi-cluster warehouses, rather than larger warehouse sizes.

  9. Ensure that the user with role used by the connector has the required privileges to use the warehouse. If that is not the case - grant the required privileges to the role:

    CREATE WAREHOUSE excel_connector_warehouse WITH WAREHOUSE_SIZE = 'X-Small';
    GRANT USAGE ON WAREHOUSE excel_connector_warehouse TO ROLE <excel_connector_role>;
    
    Copy

Set up the connector¶

As a data engineer, perform the following tasks to install and configure a connector:

Install the connector¶

  1. Navigate to the Openflow Overview page. In the Featured connectors section, select View more connectors.

  2. On the Openflow connectors page, find the connector and select Add to runtime.

  3. In the Select runtime dialog, select your runtime from the Available runtimes drop-down list.

  4. Select Add.

    Note

    Before you install the connector, ensure that you have created a database and schema in Snowflake for the connector to store ingested data.

  5. Authenticate to the deployment with your Snowflake account credentials and select Allow when prompted to allow the runtime application to access your Snowflake account. The connector installation process takes a few minutes to complete.

  6. Authenticate to the runtime with your Snowflake account credentials.

The Openflow canvas appears with the connector process group added to it.

Configure the connector¶

  1. Configure the connector to fetch all secrets required by the connector, for example, private key for key-pair authentication and certificates, from the supported secrets manager.

  2. Right-click on the imported process group and select Parameters.

  3. Populate the required parameter values as described in Flow parameters.

Flow parameters¶

This section describes the flow parameters that you can configure based on the data source and parameter contexts:

Flow parameters: For Amazon S3¶

Microsoft Excel (S3 to Snowflake) Ingestion Parameters¶

Parameter

Description

Required

Destination Table Prefix

The prefix of the table in the destination schema where data retrieved from the Excel file will be persisted. The table will be created automatically by the connector.

No

File Password

Password that protects the Excel file. Applicable only if the protection type is PASSWORD.

No

Protection Type

Protection type on the Excel file. The value can be either UNPROTECTED if the file is unprotected, or PASSWORD, if the file is protected with a password.

Yes

Ranges

The A1 notation of the comma-separated ranges to retrieve values from. For example: Sheet1!A1:B2,Sheet2!D4:E5,Sheet3. The first row in the selected range must represent column names. If not specified, then the whole workbook will be ingested. The specified ranges are applied to all files specified in S3 Object Keys.

No

S3 Bucket

The S3 bucket from which the Excel file should be fetched.

Yes

S3 Object Keys

List of comma-separated object keys within the S3 bucket that contain Excel files to fetch. Example: file1.xlsx,file2.xlsx.

Yes

Schedule

Schedule for the connector ingestion.

Yes

Microsoft Excel (S3 to Snowflake) Source Parameters¶

Parameter

Description

Required

AWS Access Key ID

Access key ID for AWS user that is used to fetch the Excel file.

Yes

AWS Secret Access Key

Secret access key for AWS user that is used to fetch the Excel file.

Yes

AWS Region

AWS region where the S3 bucket resides.

Yes

Microsoft Excel (S3 to Snowflake) Destination Parameters¶

Parameter

Description

Required

Destination Database

Name (case-sensitive) of the Snowflake database where the data will be ingested.

Yes

Destination Schema

Name (case-sensitive) of the Snowflake schema where tables will be created.

Yes

Snowflake Account Identifier

Snowflake account name formatted as [organization-name]-[account-name] where data retrieved from the Excel file will be persisted.

Yes

Snowflake Private Key

The private key, formatted according to PKCS8 standards and containing standard PEM headers and footers, used in key-pair authentication.

Yes

Snowflake Private Key Password

The password for the Snowflake Private Key. Must be left without a value if the key is not password protected.

No

Snowflake Role

Snowflake role that will be used by the connector.

Yes

Snowflake User

Username for a Snowflake account.

Yes

Snowflake Warehouse

Snowflake warehouse used to run queries when inserting data into the destination table.

Yes

Flow parameters: For SharePoint¶

Microsoft Excel (SharePoint to Snowflake) Ingestion Parameters¶

Parameter

Description

Required

Destination Table Prefix

The prefix of the table in the destination schema where data retrieved from Excel file will be persisted. The table will be created automatically by the connector.

No

File Password

Password that protects the Excel file. Applicable only if the protection type is PASSWORD.

No

Protection Type

Protection type on the Excel file. The value can be either UNPROTECTED if the file is unprotected, or PASSWORD, if the file is protected with a password.

Yes

Ranges

The A1 notation of the comma-separated ranges to retrieve values from. For example: Sheet1!A1:B2,Sheet2!D4:E5,Sheet3. The first row in the selected range must represent column names. If not specified, then the whole workbook will be ingested. The specified ranges are applied to all files specified in SharePoint Files.

No

Schedule

Schedule for the connector ingestion.

Yes

SharePoint Document Library Name

A library in the SharePoint Site where the ingested files reside.

Yes

SharePoint Files

List of comma-separated paths (relative to the root of the Document Library) of Excel files which will be ingested. Example: file1.xlsx,folder/file2.xlsx.

Yes

Microsoft Excel (SharePoint to Snowflake) Source Parameters¶

Parameter

Description

Required

SharePoint Client ID

Microsoft Entra client ID. To learn about client ID and how to find it in Microsoft Entra, see Application ID (client ID).

Yes

SharePoint Client Secret

Microsoft Entra client secret. To learn about a client secret and how to find it in Microsoft Entra, see Certificates & secrets.

Yes

SharePoint Site URL

URL of the SharePoint Site from which the ingested files will be downloaded.

Yes

SharePoint Tenant ID

Microsoft Entra tenant ID. To learn about tenant ID and how to find it in Microsoft Entra, see Find your Microsoft 365 tenant ID.

Yes

Microsoft Excel (SharePoint to Snowflake) Destination Parameters¶

Parameter

Description

Required

Destination Database

Name (case-sensitive) of the Snowflake database where the data will be ingested.

Yes

Destination Schema

Name (case-sensitive) of the Snowflake schema where tables will be created.

Yes

Snowflake Account Identifier

Snowflake account name formatted as [organization-name]-[account-name] where data retrieved from the Excel file will be persisted.

Yes

Snowflake Private Key

The private key, formatted according to PKCS8 standards and containing standard PEM headers and footers, used in key-pair authentication.

Yes

Snowflake Private Key Password

The password for the Snowflake Private Key. Must be left without a value if the key is not password protected.

No

Snowflake Role

Snowflake role that will be used by the connector.

Yes

Snowflake User

Username for a Snowflake account.

Yes

Snowflake Warehouse

Snowflake warehouse used to run queries when inserting data into the destination table.

Yes

Run the flow¶

  1. Right-click on the plane and select Enable all Controller Services.

  2. Right-click on the imported process group and select Start. The connector starts the data ingestion.

Generated table names¶

The connector creates destination tables named using the following template: {PREFIX}{FILENAME}_{RANGE}. The names are always double-quoted identifiers.

  • {PREFIX} is replaced with the value of the Destination Table Prefix parameter, e.g. prfx_.

  • {FILENAME} is replaced by the full path of ingested file, e.g. file1.xlsx or folder/file2.xlsx.

  • {RANGE} is replaced by:

    • Name of the ingested sheet if the value of Ranges parameter is empty.

    • Name of the ingested sheet with ingested range as specified in the Ranges parameter, e.g. Sheet1!A1:B2.

Example of generated table names:

  • "file1.xlsx_Sheet1"

  • "prfx_folder/file2.xlsx_Sheet1!A1:B2"

(Optional) Reconfigure the currently running connector¶

You can reconfigure the connector parameters after the connector has already started ingesting data. If you need to change the ingested files or ranges, perform the following steps to make sure that the data is sent to Snowflake properly:

  1. Stop the connector: Ensure that all Openflow processors are stopped.

  2. Access configuration settings: Navigate to the connector’s configuration settings within Openflow.

  3. Modify parameters: Adjust the parameters as required.

  4. Start the connector: Start the connector and also ensure that all controller services have started.