About Openflow Connector for Excel¶

Note

The connector is subject to the Connector Terms.

This topic describes the basic concepts of Openflow Connector for Excel, its workflow, and limitations.

The Openflow Connector for Excel connects an AWS S3 instance with Snowflake to periodically ingest Microsoft Excel files and store the ingested data inside Snowflake tables. The connector supports ingestion of the following:

  • All Excel spreadsheets

  • Specified Excel spreadsheets

  • Specified ranges of data present in an Excel workbook

The connector only performs truncate and load ingestion. This means that every time a file is ingested, the existing data in Snowflake is completely replaced with the new data from the file.

Workflow¶

  1. An AWS administrator performs the following tasks:

    1. Creates an IAM user and credentials in their AWS account.

    2. Notes the AWS region in which the S3 instance exists, for example, us-west-2.

    3. Identifies the files to be ingested.

  2. A Snowflake account administrator performs the following tasks:

    1. Sets the desired database and schema names within Snowflake.

    2. Designates a warehouse to be used by the connector.

    3. Configures the Snowflake user used by the connector and a role for this user.

  3. A data engineer performs the following tasks:

    1. Downloads and imports the connector definition file into the Snowflake Openflow canvas.

    2. Configures the connector parameters:

      1. Provides the AWS credentials.

      2. Provides Snowflake user credentials and configuration.

      3. Defines the criteria for the objects being ingested by providing filters.

    3. Starts the flow within the Openflow canvas. Upon execution, the flow performs the following actions:

      1. Downloads specified files from the S3 bucket.

      2. Extracts relevant data.

      3. Creates the configured destination table in the Snowflake database.

      4. Loads the processed data into the designated Snowflake table.

Limitations¶

  • Currently, only .xlsx files are supported (Office 2007 and later versions). The following file types are not supported:

    • .xls files (Office 97 to Office 2003 versions)

    • .xlsm files (macro-enabled workbooks)

    • .xlsb files (binary workbooks)

  • Due to memory limitations, the maximum size of the .xlsx file should not be greater than 2 MB. Larger files might work, but results may be inconsistent depending on file content and available instance memory.

  • For columns with mixed data types (for example, numbers and strings) within the same column:

    • The values are saved as strings in the Snowflake table.

    • The schema of ingested data (that is, the data types for the columns), is currently inferred using the first 10 rows. If the first 10 rows are of the same type, then the column type might not be properly recognized as mixed.

  • If the whole spreadsheet is to be ingested, without any specific range, then the first row, starting in cell A1, must be the header row and it must contain the column names.

Next steps¶

Set up the Openflow Connector for Excel