About Openflow Connector for Excel¶

Note

The connector is subject to the Connector Terms.

This topic describes the basic concepts of Openflow Connector for Excel, its workflow, and limitations.

The Openflow Connector for Excel connects an AWS S3 instance or a Microsoft SharePoint site with Snowflake to periodically ingest Microsoft Excel files and store the ingested data inside Snowflake tables.

The connector supports ingestion of the following:

  • All Excel spreadsheets

  • Specified Excel spreadsheets

  • Specified ranges of data present in an Excel workbook

The connector only performs truncate and load ingestion. This means that every time a file is ingested, the existing data in Snowflake is completely replaced with the new data from the file.

Use this connector if you’re looking to do the following:

  • Load data from Microsoft Excel spreadsheets into Snowflake tables for reporting and analytics

  • Load data from Excel spreadsheets in SharePoint into Snowflake tables for reporting and analytics

Workflow¶

Based on your data source, the workflows for the connector are as follows:

Workflow for connecting an AWS S3 instance¶

  1. An AWS administrator performs the following tasks:

    1. Creates an IAM user and credentials in their AWS account.

    2. Notes the AWS region in which the S3 instance exists, for example, us-west-2.

    3. Identifies the files to be ingested.

  2. A Snowflake account administrator performs the following tasks:

    1. Sets the desired database and schema names within Snowflake.

    2. Designates a warehouse to be used by the connector.

    3. Configures the Snowflake user used by the connector and a role for this user.

  3. A data engineer performs the following tasks:

    1. Downloads and imports the connector definition file into the Snowflake Openflow canvas.

    2. Configures the connector parameters:

      1. Provides the AWS credentials.

      2. Provides Snowflake user credentials and configuration.

      3. Defines the criteria for the objects being ingested by providing filters.

    3. Starts the flow within the Openflow canvas. Upon execution, the flow performs the following actions:

      1. Downloads specified files from the S3 bucket.

      2. Extracts relevant data.

      3. Creates the configured destination table in the Snowflake database.

      4. Loads the processed data into the designated Snowflake table.

Workflow for connecting a Microsoft SharePoint site¶

  1. A SharePoint administrator performs the following tasks:

    1. Creates a new Microsoft Entra application.

    2. Configures SharePoint to enable OAuth authentication.

    3. Identifies the files to be ingested.

  2. A Snowflake account administrator performs the following tasks:

    1. Sets the desired database and schema names within Snowflake.

    2. Designates a warehouse to be used by the connector.

    3. Configures the Snowflake user used by the connector and a role for this user.

  3. A data engineer performs the following tasks:

    1. Downloads and imports the connector definition file into the Snowflake Openflow canvas.

    2. Configures the connector parameters:

      1. Provides the SharePoint credentials.

      2. Provides Snowflake user credentials and configuration.

      3. Defines the criteria for the objects being ingested by providing filters.

    3. Starts the flow within the Openflow canvas. Upon execution, the flow performs the following actions:

      1. Downloads specified files from the SharePoint site.

      2. Extracts relevant data.

      3. Creates the configured destination table in the Snowflake database.

      4. Loads the processed data into the designated Snowflake table.

Limitations¶

  • Currently, only .xlsx files are supported (Office 2007 and later versions). The following file types are not supported:

    • .xls files (Office 97 to Office 2003 versions)

    • .xlsm files (macro-enabled workbooks)

    • .xlsb files (binary workbooks)

  • Large .xlsx files may overload the Openflow runtime or cause issues affecting other connector flows. Due to memory limitations the size of the file (assuming the connector can use all node resources) should not be greater than:

    • 2 MB for a Small runtime node.

    • 10 MB for a Medium runtime node.

    • 20 MB for a Large runtime node.

  • For columns with mixed data types (for example, numbers and strings) within the same column:

    • The values are saved as VARIANTs in the Snowflake table.

    • The schema of ingested data (that is, the data types for the columns), is currently inferred using the first 10 rows. If the first 10 rows are of the same type, then the column type might not be properly recognized as mixed.

  • If the whole spreadsheet is to be ingested, without any specific range, then the first row, starting in cell A1, must be the header row and it must contain the column names.

Next steps¶

Set up the Openflow Connector for Excel