About Openflow¶

Snowflake Openflow is an integration service that connects any data source and any destination with hundreds of processors supporting structured and unstructured text, images, audio, video and sensor data. Built on Apache NiFi, Openflow lets you run a fully managed service in your own cloud for complete control.

Note

The Openflow platform is currently available for deployment in customers’ own VPC in AWS.

This topic describes the key features of Openflow, its benefits, architecture and workflow, and use cases.

Key features and benefits¶

  • Open and extensible: An extensible managed service that’s powered by Apache NiFi, enabling you to build and extend processors from any data source to any destination.

  • Unified data integration platform: Openflow enables data engineers to handle complex, bi-directional ETL processes through a fully managed service that can be deployed inside customers’ own VPC in the cloud or on-premises.

  • Enterprise ready: Openflow offers out-of-the box security, compliance, and observability and maintainability hooks for data integration.

  • High speed ingestion of all types of data: One unified platform that lets you handle structured and unstructured data, in both batch and streaming modes from your data source to Snowflake at virtually any scale.

  • Continuous ingestion of multimodal data for AI processing: Near real-time unstructured data ingestion, so you can immediately chat with your data coming from sources such as Sharepoint, Google Drive, and so on.

Architecture¶

The following diagram illustrates the architecture of Openflow:

Openflow architecture

The deployment agent installs and bootstraps the Openflow deployment infrastructure in your VPC as well as regularly sync container images from Snowflake System Image Registry.

Some of the components of Openflow are:

  • Deployment: A deployment is where your data flows execute, within individual runtimes. You will often have multiple runtimes to isolate different projects, teams, or for SDLC reasons, all associated with a single deployment.

  • Runtime: Runtimes host your data pipelines, with the framework providing security, simplicity, and scalability. You can deploy Openflow runtimes in your VPC using Openflow. You can deploy Openflow connectors to your runtimes, and also build new pipelines from scratch using Openflow processors and controller services.

  • Control plane: The control plane is a layer in the architecture which contains all components used to manage and observe, including the Openflow service and API, which users interact with via the Openflow UI or through direct interaction with Openflow APIs.

Workflow¶

User persona

Task

AWS cloud engineer/administrator

Creates a set of deployments in their AWS cloud account.

The Openflow UI is used to manage deployments and runtime creation and maintenance. The Openflow UI allows users to create, resize, upgrade and delete runtimes in all deployments.

Snowflake logins are used to authenticate to Openflow and roles and privileges are used to control access to Openflow deployments and runtimes.

Data engineer (pipeline author, responsible for data ingestion)

Uses the runtime canvas to build flows from scratch or to configure deployed connectors.

Creates a new flow from scratch, or uses an existing connector as is or as a starting point to customize. Populates data in the bronze layer within your Snowflake account (or other target system).

Connectors are a simple way to solve for a specific integration use case, and less technical users can deploy them without necessarily needing a data engineer.

Data engineer (pipeline operator)

Configures the flow parameters and runs the flow

Data engineer (responsible for transformation to Silver and Gold layers)

Responsible for transforming data from the bronze layer that was populated by the pipeline to silver and gold layers for analytics.

Business user

Makes use of gold layer objects for analytics

Use cases¶

Use Openflow if you are looking to fetch data from any source and put it in any destination with minimal management, coupled with Snowflake’s built-in data security and governance.

Some of the use cases of Openflow are as follows:

  • Ingest data from unstructured data sources, such as Google Drive and Box, and make it ready for chat in your AI assistants with Snowflake Cortex or use the data for your own custom processing.

  • Replicate the change data capture (CDC) of database tables into Snowflake for comprehensive, centralized reporting.

  • Ingest real-time events from streaming services, such as Apache Kafka, into Snowflake for near real-time analytics.

  • Ingest data from SaaS platforms, such as LinkedIn Ads, to Snowflake for reporting, analytics and insights.

  • Create a data flow using Openflow using Snowflake and NiFi processors and controllers.

Security¶

Openflow uses industry-leading security features that help ensure you have the highest levels of security for your account and users, as well as all the data you store in Snowflake. Some key aspects include:

  • Authentication

    • Runtimes use OAuth2 for authentication to Snowflake

  • Authorization

    • Openflow supports fine-grained roles for RBAC

    • ACCOUNTADMIN to grant privileges to be able to create deployments and runtimes

  • Encryption in-transit

    • Openflow connectors support TLS protocol, using standard Snowflake clients for data ingestion

    • All the communications between the Openflow deployments and Openflow control plane are encrypted using TLS protocol

  • Secrets management (BYOC)

  • Private link support

    • Openflow connectors are compatible with reading and writing data to Snowflake using inbound AWS PrivateLink

  • Tri-Secret Secure support

    • Openflow connectors are compatible with Tri-Secret Secure for writing data to Snowflake.

Regional availability¶

Openflow is available to all accounts in AWS Commercial Regions. Openflow is not available in government regions.

Next step¶

Set up Openflow