About Openflow¶

Snowflake Openflow is an integration service that connects any data source and any destination with hundreds of processors supporting structured and unstructured text, images, audio, video and sensor data. Built on Apache NiFi, Openflow lets you run a fully managed service in your own cloud for complete control.

Note

The Openflow platform is currently available for deployment in customers’ own VPCs in both AWS and Snowpark Container Services.

This topic describes the key features of Openflow, its benefits, architecture, and workflow, and use cases.

Key features and benefits¶

Open and extensible

An extensible managed service that’s powered by Apache NiFi, enabling you to build and extend processors from any data source to any destination.

Unified data integration platform

Openflow enables data engineers to handle complex, bi-directional data extraction and loading through a fully managed service that can be deployed inside your own VPC or within your Snowflake deployment.

Enterprise-ready

Openflow offers out-of-the box security, compliance, and observability and maintainability hooks for data integration.

High speed ingestion of all types of data

One unified platform lets you handle structured and unstructured data, in both batch and streaming modes, from your data source to Snowflake at virtually any scale.

Continuous ingestion of multimodal data for AI processing

Nea real-time unstructured data ingestion, so you can immediately chat with your data coming from sources such as Sharepoint, Google Drive, and so on.

Openflow - Snowflake Deployment models¶

Openflow is supported in both the Bring Your Own Cloud (BYOC) and Snowpark Container Services (SPCS) versions.

Openflow - Snowflake Deployment (SPCS)

Openflow - Snowflake Deployment, using Snowpark Container Services (SPCS), provides a streamlined and integrated solution for connectivity. Because SPCS is a self-contained service within Snowflake, it’s easy to deploy and manage and offers a convenient and cost-effective environment for running your data flows. A key advantage of Openflow - Snowflake Deployment is its native integration with Snowflake’s security model, which allows for seamless authentication, authorization, and network security and simplified operations.

Openflow BYOC

Openflow bring your own cloud (BYOC) provides a connectivity solution that you can use to connect public and private systems securely and handle sensitive data preprocessing locally, within the secure bounds of your organization’s cloud environment. BYOC refers to a deployment option where the Openflow data processing engine, or data plane, runs within your own cloud environment while Snowflake manages the overall Openflow service and control plane.

Use cases¶

Use Openflow if you want to fetch data from any source and put it in any destination with minimal management, coupled with Snowflake’s built-in data security and governance.

Openflow use cases include:

  • Ingest data from unstructured data sources, such as Google Drive and Box, and make it ready for chat in your AI assistants with Snowflake Cortex or use the data for your own custom processing.

  • Replicate the change data capture (CDC) of database tables into Snowflake for comprehensive, centralized reporting.

  • Ingest real-time events from streaming services, such as Apache Kafka, into Snowflake for near real-time analytics.

  • Ingest data from SaaS platforms, such as LinkedIn Ads, to Snowflake for reporting, analytics, and insights.

  • Create an Openflow dataflow using Snowflake and NiFi processors and controller services.

Security¶

Openflow uses industry-leading security features that help ensure you have the highest levels of security for your account, and users, and all the data you store in Snowflake. Some key aspects include:

Authentication
  • Runtimes use OAuth2 for authentication to Snowflake.

Authorization
  • Openflow supports fine-grained roles for RBAC

  • ACCOUNTADMIN to grant privileges to be able to create deployments and runtimes

Encryption in-transit
  • Openflow connectors support TLS protocol, using standard Snowflake clients for data ingestion.

  • All the communications between the Openflow deployments and Openflow control plane are encrypted using TLS protocol.

Secrets management (BYOC)
Private link support
  • Openflow connectors are compatible with reading and writing data to Snowflake using inbound AWS PrivateLink.

Tri-Secret Secure support
  • Openflow connectors are compatible with Tri-Secret Secure for writing data to Snowflake.

Architecture¶

The following diagram illustrates the architecture of Openflow:

Openflow architecture

The deployment agent installs and bootstraps the Openflow deployment infrastructure in your VPC and regularly sync container images from the Snowflake system image registry.

Openflow components include:

Deployment

A deployment is where your data flows execute, within individual runtimes. You will often have multiple runtimes to isolate different projects, teams, or for SDLC reasons, all associated with a single deployment.

Control plane

The control plane is a layer in the architecture containing all components used to manage and observe, including the Openflow service and API, which users interact with via the Openflow UI or through direct interaction with Openflow APIs. On Openflow Snowflake Deployments the Control Plane (CP) consists of Snowflake-owned public cloud infrastructure/services and the control plane application itself.

Openflow - Snowflake Deployment

Openflow - Snowflake Deployment services are deployed using an SPCS compute pool and incur utilization charges based on their uptime and usage of compute. See Openflow Snowflake Deployment cost and scaling considerations for more information.

Runtime

Runtimes host your data pipelines, with the framework providing security, simplicity, and scalability. You can deploy Openflow runtimes in your VPC using Openflow. You can deploy Openflow connectors to your runtimes, and also build completely new pipelines using Openflow processors and controller services.

Openflow - Snowflake Deployment Runtime

An Openflow - Snowflake Deployment Runtime is deployed as an Openflow - Snowflake Deployment wervice to an Openflow - Snowflake Deployment deployment, represented by an underlying compute pool. Customers request a Runtime Openflow - Snowflake Deployment through the deployment, which executes a request on behalf of the user to Service. Once created, customers access it via a web browser at the URL generated for that particular Openflow - Snowflake Deployment service.