About Openflow¶

Snowflake Openflow is an integration service that connects any data source and any destination with hundreds of processors supporting structured and unstructured text, images, audio, video and sensor data. Built on Apache NiFi, Openflow lets you run a fully managed service in your own cloud for complete control.

Note

The Openflow platform is currently available for deployment in customers’ own VPC in AWS.

This topic describes the key features of Openflow, its benefits, architecture and workflow, and use cases.

Key features and benefits¶

Open and extensible: An extensible managed service that’s powered by Apache NiFi, enabling you to build and extend processors from any data source to any destination.
Unified data integration platform: Openflow enables data engineers to handle complex, bi-directional data extraction and loading through a fully managed service that can be deployed inside your own VPC.
Enterprise ready: Openflow offers out-of-the box security, compliance, and observability and maintainability hooks for data integration.
High speed ingestion of all types of data: One unified platform that lets you handle structured and unstructured data, in both batch and streaming modes from your data source to Snowflake at virtually any scale.
Continuous ingestion of multimodal data for AI processing: Near real-time unstructured data ingestion, so you can immediately chat with your data coming from sources such as Sharepoint, Google Drive, and so on.

Architecture¶

The following diagram illustrates the architecture of Openflow:

The deployment agent installs and bootstraps the Openflow deployment infrastructure in your VPC as well as regularly sync container images from Snowflake System Image Registry.

Openflow components:

Deployment: A deployment is where your data flows execute, within individual runtimes. You will often have multiple runtimes to isolate different projects, teams, or for SDLC reasons, all associated with a single deployment.
Runtime: Runtimes host your data pipelines, with the framework providing security, simplicity, and scalability. You can deploy Openflow runtimes in your VPC using Openflow. You can deploy Openflow connectors to your runtimes, and also build new pipelines from scratch using Openflow processors and controller services.
Control plane: The control plane is a layer in the architecture which contains all components used to manage and observe, including the Openflow service and API, which users interact with via the Openflow UI or through direct interaction with Openflow APIs.

Workflow¶

User persona	Task
AWS cloud engineer/administrator	Creates a set of deployments in their AWS cloud account. The Openflow UI is used to manage deployments and runtime creation and maintenance. The Openflow UI allows users to create, resize, upgrade and delete runtimes in all deployments. Snowflake logins are used to authenticate to Openflow and roles and privileges are used to control access to Openflow deployments and runtimes.
Data engineer (pipeline author, responsible for data ingestion)	Uses the runtime canvas to build flows from scratch or to configure deployed connectors. Creates a new flow from scratch, or uses an existing connector as is or as a starting point to customize. Populates data in the bronze layer within your Snowflake account (or other target system). Connectors are a simple way to solve for a specific integration use case, and less technical users can deploy them without necessarily needing a data engineer.
Data engineer (pipeline operator)	Configures the flow parameters and runs the flow
Data engineer (responsible for transformation to Silver and Gold layers)	Responsible for transforming data from the bronze layer that was populated by the pipeline to silver and gold layers for analytics.
Business user	Makes use of gold layer objects for analytics

Use cases¶

Use Openflow if you are looking to fetch data from any source and put it in any destination with minimal management, coupled with Snowflake’s built-in data security and governance.

Some of the use cases of Openflow are as follows:

Ingest data from unstructured data sources, such as Google Drive and Box, and make it ready for chat in your AI assistants with Snowflake Cortex or use the data for your own custom processing.
Replicate the change data capture (CDC) of database tables into Snowflake for comprehensive, centralized reporting.
Ingest real-time events from streaming services, such as Apache Kafka, into Snowflake for near real-time analytics.
Ingest data from SaaS platforms, such as LinkedIn Ads, to Snowflake for reporting, analytics and insights.
Create a data flow using Openflow using Snowflake and NiFi processors and controllers.

Security¶

Openflow uses industry-leading security features that help ensure you have the highest levels of security for your account and users, as well as all the data you store in Snowflake. Some key aspects include:

Authentication
- Runtimes use OAuth2 for authentication to Snowflake
Authorization
- Openflow supports fine-grained roles for RBAC
- ACCOUNTADMIN to grant privileges to be able to create deployments and runtimes
Encryption in-transit
- Openflow connectors support TLS protocol, using standard Snowflake clients for data ingestion
- All the communications between the Openflow deployments and Openflow control plane are encrypted using TLS protocol
Secrets management (BYOC)
- Integration with AWS Secrets Manager or Hashicorp Vault. For more information, see Encrypted Passwords in Configuration Files.
Private link support
- Openflow connectors are compatible with reading and writing data to Snowflake using inbound AWS PrivateLink
Tri-Secret Secure support
- Openflow connectors are compatible with Tri-Secret Secure for writing data to Snowflake.

Regional availability¶

Openflow is available to all accounts in AWS Commercial Regions. Openflow is not available in government regions.

Openflow is not available for trial accounts.

Limitations¶

As described in the Snowflake Openflow BYOC terms, securing Openflow BYOC is a shared responsibility model.
Openflow authorization uses roles and their associated privileges that are directly granted to the user. Currently, Openflow does not support authorization when the role is attached to another role within the user’s role hierarchy.

Next step¶

Set up Openflow